Bug 40668 - 64-bit sparc miscompiles memcpy of argument inside switch
Summary: 64-bit sparc miscompiles memcpy of argument inside switch
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.4.0
: P3 normal
Target Milestone: 4.4.1
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-07-07 05:56 UTC by Ben Pfaff
Modified: 2010-09-20 21:46 UTC (History)
4 users (show)

See Also:
Host: sparc64-unknown-linux-gnu
Target: sparc64-unknown-linux-gnu
Build: sparc64-unknown-linux-gnu
Known to work:
Known to fail:
Last reconfirmed:


Attachments
preprocessed test input (3.58 KB, text/plain)
2009-07-07 05:57 UTC, Ben Pfaff
Details
test program (before preprocessing) (316 bytes, text/plain)
2009-07-07 05:58 UTC, Ben Pfaff
Details
gcc44-pr40668.patch (761 bytes, patch)
2009-07-07 19:05 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ben Pfaff 2009-07-07 05:56:37 UTC
The following program should print 12345678.  When it is built with "-O2 -m64" or "-O3 -fno-inline -m64" on sparc, it instead prints 0.

I ran this test with the command "/opt/cfarm/release/4.4.0/bin/gcc -fno-inline -O3 -m64 -g combined.c" on gcc62 in the GCC compile farm.

----------------------------------------------------------------------

"/opt/cfarm/release/4.4.0/bin/gcc -v" prints:

    Using built-in specs.
    Target: sparc64-unknown-linux-gnu
    Configured with: ../gcc-4.4.0/configure --enable-languages=c,c++,fortran,ada --prefix=/opt/cfarm/release/4.4.0 --enable-__cxa_atexit --enable-threads=posix --disable-nls --with-mpfr=/opt/cfarm/mpfr-2.4.1 --with-gmp=/opt/cfarm/gmp-4.2.4 --with-cpu=v8
    Thread model: posix
    gcc version 4.4.0 (GCC) 

----------------------------------------------------------------------

#include <stdint.h>
#include <stdio.h>
#include <string.h>

/* Stores 32-bit unsigned integer X at P,
   which need not be aligned. */
static void
put_uint32 (uint32_t x, void *p)
{
  memcpy (p, &x, sizeof x);
}

void
store_12345678 (int type, void *number)
{
  switch (type)
    {
    case 1:
      printf ("got here\n");
      put_uint32 (0x12345678, number);
      break;

    case 7:
      put_uint32 (0, number);
      break;
    case 8:
      put_uint32 (0, number);
      break;
    case 9:
      put_uint32 (0, number);
      break;
    }
}

int
main (void)
{
  uint32_t x;
  store_12345678 (1, &x);
  printf ("%x\n", (unsigned int) x);
  return 0;
}
Comment 1 Ben Pfaff 2009-07-07 05:57:52 UTC
Created attachment 18147 [details]
preprocessed test input
Comment 2 Ben Pfaff 2009-07-07 05:58:47 UTC
Created attachment 18148 [details]
test program (before preprocessing)
Comment 3 Mikael Pettersson 2009-07-07 11:35:51 UTC
Confirmed, with gcc-4.3-20090705 it works, with gcc-4.4-20090630 it fails. Compiling with -S and comparing the .s files it looks like 4.4 completely mis-schedules the code for put_uint32:

put_uint32:
        .register       %g2, #scratch
        .register       %g3, #scratch
        ldub    [%sp+2175], %g1
        ldub    [%sp+2176], %g3
        ldub    [%sp+2177], %g2
        ldub    [%sp+2178], %g4
        st      %o0, [%sp+2175]
        stb     %g4, [%o1+3]
        stb     %g1, [%o1]
        stb     %g3, [%o1+1]
        jmp     %o7+8
         stb    %g2, [%o1+2]

Notice how the store of %o0 to the four bytes at %sp+2175 comes after the corresponding byte loads, so %g1 to %g4 are loaded with garbage, likely zeroes.

In contrast, gcc-4.3 generates the store before the loads:

put_uint32:
        .register       %g2, #scratch
        .register       %g3, #scratch
        st      %o0, [%sp+2175]
        ldub    [%sp+2176], %g3
        ldub    [%sp+2177], %g4
        ldub    [%sp+2178], %g2
        ldub    [%sp+2175], %g1
        stb     %g2, [%o1+3]
        stb     %g1, [%o1]
        stb     %g3, [%o1+1]
        jmp     %o7+8
         stb    %g4, [%o1+2]
Comment 4 Mikael Pettersson 2009-07-07 16:28:39 UTC
A reghunt identified Jakub's (added to cc: list) r142481 (PR38367 fix) as the source of this regression.
Comment 5 Jakub Jelinek 2009-07-07 19:05:15 UTC
Created attachment 18151 [details]
gcc44-pr40668.patch

Untested patch that fixes this testcase.  I believe my commit was correct, but apparently it can be modified later on without adjusting MEM_OFFSET.
I don't have a working SPARC box around ATM, so I can't bootstrap/regtest it there.
Comment 6 Mikael Pettersson 2009-07-07 23:10:05 UTC
(In reply to comment #5)
> Created an attachment (id=18151) [edit]
> gcc44-pr40668.patch
> 
> Untested patch that fixes this testcase.

Thanks. This fixes the issue in a cross-compiler to sparc64-linux. I'm currently bootstrapping 4.4-20090630 plus this patch on an Ultra5, I'll follow up once that's complete (it will take quite a while).

Comment 7 Mikael Pettersson 2009-07-08 16:43:47 UTC
4.4-20090630 plus this fix bootstrapped fine, fixed the test case, built a working 2.6.31-rc2 Linux kernel, and built a working Erlang VM.
Comment 8 Ben Pfaff 2009-07-08 17:30:03 UTC
Wow, that's amazingly fast turnaround.  Thanks so much guys!
Comment 9 Jakub Jelinek 2009-07-11 09:23:48 UTC
Subject: Bug 40668

Author: jakub
Date: Sat Jul 11 09:23:32 2009
New Revision: 149511

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=149511
Log:
	PR target/40668
	* function.c (assign_parm_setup_stack): Adjust
	MEM_OFFSET (data->stack_parm) if promoted_mode is different
	from nominal_mode on big endian.

	* gcc.c-torture/execute/pr40668.c: New test.

Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/pr40668.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/function.c
    trunk/gcc/testsuite/ChangeLog

Comment 10 Jakub Jelinek 2009-07-11 09:26:35 UTC
Subject: Bug 40668

Author: jakub
Date: Sat Jul 11 09:26:23 2009
New Revision: 149512

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=149512
Log:
	PR target/40668
	* function.c (assign_parm_setup_stack): Adjust
	MEM_OFFSET (data->stack_parm) if promoted_mode is different
	from nominal_mode on big endian.

	* gcc.c-torture/execute/pr40668.c: New test.

Added:
    branches/gcc-4_4-branch/gcc/testsuite/gcc.c-torture/execute/pr40668.c
Modified:
    branches/gcc-4_4-branch/gcc/ChangeLog
    branches/gcc-4_4-branch/gcc/function.c
    branches/gcc-4_4-branch/gcc/testsuite/ChangeLog

Comment 11 Eric Botcazou 2010-09-20 21:46:24 UTC
By Jakub.