Bug 55940 - [4.7 Regression] Incorrect code for accessing parameters with 32-bit Intel hosts
Summary: [4.7 Regression] Incorrect code for accessing parameters with 32-bit Intel hosts
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.7.2
: P3 normal
Target Milestone: 4.7.3
Assignee: Jakub Jelinek
Depends on:
Reported: 2013-01-11 11:16 UTC by Frank Mehnert
Modified: 2013-02-01 14:30 UTC (History)
1 user (show)

See Also:
Target: i?86-*-*
Known to work:
Known to fail:
Last reconfirmed: 2013-01-11 00:00:00

Preprocessed source (213.34 KB, application/x-gzip)
2013-01-15 16:54 UTC, Frank Mehnert

Note You need to log in before you can comment on or make changes to this bug.
Description Frank Mehnert 2013-01-11 11:16:49 UTC
There are some reports about incorrect compiled Linux kernel modules for VirtualBox. I've debugged one report (see here: https://www.virtualbox.org/ticket/11035) and saw that the compiler generated invalid code for accessing parameters inside a function. When it should read the parameter from the stack it reads the parameter from a register.

I don't know exactly under which circumstance this happens but this seems to be related to 32-bit x86 targets. The function VBoxHost_RTR0MemObjGetPagePhysAddr is marked as __attribute__((cdelc, regparm(0))). The pre-processed memobj-r0drv.i file is attached to that ticket. The generated code is part of the vboxdrv.ko file which is also attached to the ticket.

The following code is generated to access the first function parameter (keep cdecl in mind!):

    955e:       8b 0f                   mov    (%edi),%ecx
    9560:       8b 47 04                mov    0x4(%edi),%eax
    9563:       8d 91 00 10 00 00       lea    0x1000(%ecx),%edx

So the function is using the EDI register to access the parameter while it should read the parameter from the stack.

The C code of this function looks (see memobj-r0drv.i) is:

RTHCPHYS __attribute__((cdecl,regparm(0))) VBoxHost_RTR0MemObjGetPagePhysAddr(RTR0MEMOBJ MemObj, size_t iPage)

    size_t cPages;
    do { if (__builtin_expect(!!(!(( (uintptr_t)(MemObj) + 0x1000U >= 0x2000U ))), 0)) return ((~(RTHCPHYS)0)); } while (0);

(Explanation of the code: This is actually a sanity check if the pointer is valid; the value must be either less than 0xFFFFF000U or greater than 0x00000FFFU).

Unfortunately I cannot reproduce this problem myself (gcc 4.7.2 on my Linux distribution creates correct code). The gcc compiler the user is using is

gcc version 4.7.2 (Exherbo gcc-4.7.2-r2)

(see comment 18 in the above ticket).
Comment 1 Richard Biener 2013-01-11 11:23:02 UTC
Your assumptions are wrong according to documentation:

@item cdecl
@cindex functions that do pop the argument stack on the 386
@opindex mrtd
On the Intel 386, the @code{cdecl} attribute causes the compiler to
assume that the calling function pops off the stack space used to
pass arguments.  This is
useful to override the effects of the @option{-mrtd} switch.

which doesn't say that all arguments will be passed on the stack.
Comment 2 Richard Biener 2013-01-11 11:26:44 UTC
Please provide preprocessed source and compiler command-line.

I suppose VBoxHost_RTR0MemObjGetPagePhysAddr is static so the compiler
is free to choose a more optimal calling convention.  If you want to
avoid this mark it with the used attribute, I don't think an explicit
regparm(0) will prevent this.

Just guessing, without a testcase.
Comment 3 Richard Biener 2013-01-11 11:36:36 UTC
Btw, I can't reproduce it with the testcase from the virtualbox ticket.  I
also cannot reproduce that regparm(0) does not correctly cancel -mregparm=3.
Comment 4 Frank Mehnert 2013-01-15 13:53:04 UTC
The problem is that I cannot reproduce this problem either. Some versions of gcc 4.7 might be affected (the affected user uses 'Exherbo gcc-4.7.2-r2'), others not. As far as I can tell, only Intel 32-bit code is affected.

Regarding -mrtd: I don't know if this command line parameter is used at all. I've just asked the affected user for the complete command line. But a function with 'cdecl,regparm(0)' should pass all parameters on the stack on 32-bit x86 machines, right? And even if some parameters are passed in registers, EDI is normally not used, only EAX, EDX or ECX but never EDI.
Comment 5 Frank Mehnert 2013-01-15 15:03:14 UTC
Actually I was able to reproduce generating the buggy code. My gcc version:

Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.7.2-5' --with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs --enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.7 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.7 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --with-arch-32=i586 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.7.2 (Debian 4.7.2-5) 

I used the memobj-r0drv.i file from the above ticket and used the following command line:

gcc -Os -m32 -c -o memobj-r0drv.o memobj-r0drv.i -mpreferred-stack-boundary=2

The -mpreferred-stack-boundary=2 is essential! Using this command line, the same wrong code is generated (parameter is assumed to be passed in the EDI register).
Comment 6 Jakub Jelinek 2013-01-15 15:22:14 UTC
You haven't provided the preprocessed testcase, so it is hard to guess, but generally, if you have say a static function and call it also from assembly, you need __attribute__((used)) to prevent the compiler from using different calling conventions.  Otherwise, if the compiler can see all possible callers (inline asm doesn't count, then you have to use the used attribute), it can decide not to emit the function at all, or use whatever calling convention it thinks are best for the function.
Comment 7 Frank Mehnert 2013-01-15 15:37:43 UTC
Actually this looks like some mixup in the generated machine code:

 c1b:   8b 0f                   mov    (%edi),%ecx
 c1d:   8b 47 04                mov    0x4(%edi),%eax
 c20:   8d 91 00 10 00 00       lea    0x1000(%ecx),%edx
 c26:   81 fa ff 1f 00 00       cmp    $0x1fff,%edx
 c2c:   76 49                   jbe    c77 <VBoxHost_RTR0MemObjGetPagePhysAddr+0x5c>
 c2e:   81 39 10 12 61 19       cmpl   $0x19611210,(%ecx)
 c34:   75 41                   jne    c77 <VBoxHost_RTR0MemObjGetPagePhysAddr+0x5c>
 c36:   55                      push   %ebp
 c37:   89 e5                   mov    %esp,%ebp
 c39:   57                      push   %edi
 c3a:   53                      push   %ebx
 c3b:   8b 51 08                mov    0x8(%ecx),%edx
 c3e:   8d 7d 08                lea    0x8(%ebp),%edi
 c41:   8d 5a ff                lea    -0x1(%edx),%ebx
 c44:   83 fb 07                cmp    $0x7,%ebx
 c47:   77 34                   ja     c7d <VBoxHost_RTR0MemObjGetPagePhysAddr+0x62>

The EDI register is loaded from stack later and the two lines at 0xc1b and 0xc1d just access the EDI register before it is properly initialized.
Comment 8 Frank Mehnert 2013-01-15 16:54:53 UTC
Created attachment 29172 [details]
Preprocessed source
Comment 9 Frank Mehnert 2013-01-15 16:55:40 UTC
Just added the preprocessed source, sorry for the delay. It's the source code from the VirtualBox ticket.
Comment 10 Frank Mehnert 2013-01-15 16:57:48 UTC
And regarding your comment, Jakub, in this case it's not a question of not emitting the code of a function or not. The function code is emitted and the code is definitely wrong. EDI was not initialized before it was used.
Comment 11 Jakub Jelinek 2013-01-15 17:45:48 UTC
Sounds like shrink-wrapping bug to me:

/* { dg-options "-Os" } */
/* { dg-additional-options "-mpreferred-stack-boundary=2" { target { { i?86-*-* x86_64-*-* } && ia32 } } } */

struct S { int s; unsigned long t; };

__attribute__ ((visibility ("hidden"))) unsigned long long bar (struct S *x, unsigned long y);

unsigned long long
foo (struct S *x, unsigned long y)
  unsigned long a;
  if (__builtin_expect (((unsigned long) (x) + 0x1000U < 0x2000U), 0))
    return ~0ULL;
  if (__builtin_expect (x->s <= 0 || x->s > 9, 0))
    return ~0ULL;
  a = x->t >> 12;
  if (y >= a && y == a)
    return ~0ULL;
  if (x->s == 3)
    return x->t + y * 4096;
  return bar (x, y);

The pre-prologue code can't use registers initialized in the prologue.
Comment 12 Jakub Jelinek 2013-01-15 18:28:31 UTC
I think:

--- gcc/function.c.jj	2013-01-11 09:02:55.000000000 +0100
+++ gcc/function.c	2013-01-15 19:23:20.648826011 +0100
@@ -6029,7 +6029,7 @@ thread_prologue_and_epilogue_insns (void
       if (pic_offset_table_rtx)
 	add_to_hard_reg_set (&set_up_by_prologue.set, Pmode,
-      if (stack_realign_drap && crtl->drap_reg)
+      if (crtl->drap_reg)
 	add_to_hard_reg_set (&set_up_by_prologue.set,
 			     GET_MODE (crtl->drap_reg),
 			     REGNO (crtl->drap_reg));

should fix it, in this case we pessimistically assume we might need to realign the stack and because of -Os we create vDRAP, but then find out we don't actually need to realign anything.
Comment 13 Jakub Jelinek 2013-01-15 22:58:28 UTC
Author: jakub
Date: Tue Jan 15 22:58:21 2013
New Revision: 195220

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=195220
	PR target/55940
	* function.c (thread_prologue_and_epilogue_insns): Always
	add crtl->drap_reg to set_up_by_prologue.set, even if
	stack_realign_drap is false.

	* gcc.dg/pr55940.c: New test.

Comment 14 Jakub Jelinek 2013-01-16 07:59:51 UTC
Fixed on the trunk so far.
Comment 15 Frank Mehnert 2013-01-16 09:01:02 UTC
Great, thank you Jakub!

As it will take some time until the Linux distributions will update their gcc binaries to include this fix, do you have any suggestion how to  work around this bug by changing the code? Omitting compiler switches on the command line will not work in our scenario as the Linux kernel Makefiles define the gcc command line parameters.

And can you confirm that this bug only affects 32-bit x86 targets or does it affect 64-bit x86 targets as well?
Comment 16 Jakub Jelinek 2013-01-16 09:16:15 UTC
As a workaround, you can use something like
#if __GNUC__ == 4 && __GNUC_MINOR__ == 7
__attribute__((__optimize__ ("no-shrink-wrap")))
on the VBoxHost_RTR0MemObjGetPagePhysAddr function or don't use long long for the return type here (pass it by reference etc., that is the reason why gcc even thought about potentially needing the stack realignment), don't use -mpreferred-stack-boundary=2, or -Os, or use optimize (2) attribute, there are lots of options.
Comment 17 Jakub Jelinek 2013-02-01 14:09:53 UTC
Author: jakub
Date: Fri Feb  1 14:09:38 2013
New Revision: 195656

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=195656
	Backported from mainline
	2013-01-15  Jakub Jelinek  <jakub@redhat.com>

	PR target/55940
	* function.c (thread_prologue_and_epilogue_insns): Always
	add crtl->drap_reg to set_up_by_prologue.set, even if
	stack_realign_drap is false.

	* gcc.dg/pr55940.c: New test.

Comment 18 Jakub Jelinek 2013-02-01 14:30:47 UTC