Bug 56807 - mingw32: Conflict between stack realignment and stack probe destroys function argument in EAX
Summary: mingw32: Conflict between stack realignment and stack probe destroys function...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.7.2
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks: 60193
  Show dependency treegraph
 
Reported: 2013-04-02 07:45 UTC by Andrew Church
Modified: 2014-02-20 14:44 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2013-05-15 00:00:00


Attachments
testcase-1.c (265 bytes, text/plain)
2013-04-02 07:47 UTC, Andrew Church
Details
testcase-2.c (33 bytes, text/plain)
2013-04-02 07:48 UTC, Andrew Church
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Church 2013-04-02 07:45:31 UTC
When compiling for mingw32 with an incoming stack boundary less than the preferred stack boundary, if a non-leaf function with a large (>4000 bytes) stack size takes arguments in registers, the first argument may be destroyed depending on the actual stack alignment:

00000000 <_bar>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 e4 f0                and    $0xfffffff0,%esp
   6:   50                      push   %eax
   7:   b8 1c 10 00 00          mov    $0x101c,%eax
   c:   e8 00 00 00 00          call   11 <_bar+0x11>
                        d: DISP32       ___chkstk_ms
  11:   29 c4                   sub    %eax,%esp
  13:   8b 45 f4                mov    -0xc(%ebp),%eax

Note that the stack realignment (at 3) takes place after the frame pointer is saved (at 1) but before the first argument is pushed to the stack (at 6), so the offset from the frame pointer to the saved first argument is unknown, yet GCC tries to reload the argument using the frame pointer as a base (at 13).  If the stack is not aligned to a multiple of 16 bytes before the function is called, the reload will get some random value from the stack instead of the first argument.

Configure options: --prefix=/usr --bindir=/usr/i686-pc-linux-gnu/mingw32/gcc-bin/4.7.2 --includedir=/usr/lib/gcc/mingw32/4.7.2/include --datadir=/usr/share/gcc-data/mingw32/4.7.2 --mandir=/usr/share/gcc-data/mingw32/4.7.2/man --infodir=/usr/share/gcc-data/mingw32/4.7.2/info --with-gxx-include-dir=/usr/lib/gcc/mingw32/4.7.2/include/g++-v4 --host=i686-pc-linux-gnu --target=mingw32 --build=i686-pc-linux-gnu --disable-altivec --disable-fixed-point --without-ppl --without-cloog --enable-lto --disable-nls --with-system-zlib --enable-obsolete --disable-werror --enable-secureplt --disable-multilib --disable-libmudflap --disable-libssp --disable-libgomp --with-python-dir=/share/gcc-data/mingw32/4.7.2/python --enable-poison-system-directories --enable-checking=release --disable-libgcj --enable-libstdcxx-time --disable-libquadmath --enable-languages=c,c++ --with-sysroot=/usr/mingw32
Comment 1 Andrew Church 2013-04-02 07:47:41 UTC
Created attachment 29773 [details]
testcase-1.c

This file contains the bulk of the testcase.  The function foo() has to be defined in a separate file to trigger the bug.

Compile with: mingw32-gcc -mincoming-stack-boundary=2 -mpreferred-stack-boundary=4 testcase-1.c testcase-2.c -o testcase.exe
Comment 2 Andrew Church 2013-04-02 07:48:35 UTC
Created attachment 29774 [details]
testcase-2.c

Contains the definition of foo().
Comment 3 Andrew Church 2013-04-02 07:53:08 UTC
The __chkstk_ms call comes from allocate_stack_worker_probe_<mode> in i386.md, which is used in allocate_stack if the CHECK_STACK_LIMIT symbol is defined to a nonzero value.  CHECK_STACK_LIMIT defaults to zero, but it's set to 4000 for mingw32.
Comment 4 Kai Tietz 2013-05-15 11:59:11 UTC
Yes, this happens for function using eax as input-argument hand using stack-allocation.  btw it isn't specific to chkstk_ms function.

I will take a look
Comment 5 Kai Tietz 2013-12-05 23:45:18 UTC
Yeah, the issue is the use of frame-pointer ...  instead we should use here stack-pointer relative addressing as we are that way realigned stack-pointer agnostic.

Following patch solves the issue.  Could you give it a try?

Index: i386.c
===================================================================
--- i386.c	(Revision 205719)
+++ i386.c	(Arbeitskopie)
@@ -10934,18 +10934,21 @@ ix86_expand_prologue (void)
 	}
       m->fs.sp_offset += allocate;
 
+      /* Use stack_pointer_rtx for relative addressing so that code
+	 works for realigned stack, too.  */
       if (r10_live && eax_live)
         {
-	  t = choose_baseaddr (m->fs.sp_offset - allocate);
+	  t = plus_constant (Pmode, stack_pointer_rtx, 0 - allocate);
 	  emit_move_insn (gen_rtx_REG (word_mode, R10_REG),
 			  gen_frame_mem (word_mode, t));
-	  t = choose_baseaddr (m->fs.sp_offset - allocate - UNITS_PER_WORD);
+	  t = plus_constant (Pmode, stack_pointer_rtx,
+			     0 - allocate - UNITS_PER_WORD);
 	  emit_move_insn (gen_rtx_REG (word_mode, AX_REG),
 			  gen_frame_mem (word_mode, t));
 	}
       else if (eax_live || r10_live)
 	{
-	  t = choose_baseaddr (m->fs.sp_offset - allocate);
+	  t = plus_constant (Pmode, stack_pointer_rtx, 0 - allocate);
 	  emit_move_insn (gen_rtx_REG (word_mode,
 				       (eax_live ? AX_REG : R10_REG)),
 			  gen_frame_mem (word_mode, t));
Comment 6 Andrew Church 2013-12-06 06:20:28 UTC
Still broken for me, sorry.  Using SVN r205727 with the patch, the assembly now looks like:

00000000 <_bar>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 e4 f0                and    $0xfffffff0,%esp
   6:   50                      push   %eax
   7:   b8 1c 10 00 00          mov    $0x101c,%eax
   c:   e8 00 00 00 00          call   11 <_bar+0x11>
                        d: DISP32       ___chkstk_ms
  11:   29 c4                   sub    %eax,%esp
  13:   8b 84 24 e4 ef ff ff    mov    -0x101c(%esp),%eax

so it's using the stack pointer but the offset is in the wrong direction.  Should "0 - allocate" be just "allocate" in the plus_constant() calls?
Comment 7 Kai Tietz 2013-12-06 13:42:31 UTC
duh.  Yes, of course the '0 -' is wrong.  We want to address backward.

Does the patch with this change fixes your issue?
Comment 8 Andrew Church 2013-12-06 16:52:39 UTC
Yes, by replacing "0 - allocate" with "allocate" it seems to work fine.  Sorry for not trying it out myself earlier.
Comment 9 Kai Tietz 2013-12-10 16:36:56 UTC
Author: ktietz
Date: Tue Dec 10 16:36:53 2013
New Revision: 205860

URL: http://gcc.gnu.org/viewcvs?rev=205860&root=gcc&view=rev
Log:
    PR target/56807
    * config/i386/i386.c (ix86_expand_prologue): Address saved
    registers stack-relative, not via frame-pointer.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
Comment 10 Kai Tietz 2013-12-10 16:40:38 UTC
Author: ktietz
Date: Tue Dec 10 16:40:36 2013
New Revision: 205861

URL: http://gcc.gnu.org/viewcvs?rev=205861&root=gcc&view=rev
Log:
        PR target/56807
        * config/i386/i386.c (ix86_expand_prologue): Address saved
        registers stack-relative, not via frame-pointer.

Modified:
    branches/gcc-4_8-branch/gcc/ChangeLog
    branches/gcc-4_8-branch/gcc/config/i386/i386.c
Comment 11 Kai Tietz 2013-12-10 16:52:26 UTC
Author: ktietz
Date: Tue Dec 10 16:52:23 2013
New Revision: 205864

URL: http://gcc.gnu.org/viewcvs?rev=205864&root=gcc&view=rev
Log:
        PR target/56807
        * config/i386/i386.c (ix86_expand_prologue): Address saved
        registers stack-relative, not via frame-pointer.

Modified:
    branches/gcc-4_7-branch/gcc/ChangeLog
    branches/gcc-4_7-branch/gcc/config/i386/i386.c
Comment 12 Kai Tietz 2013-12-10 16:53:24 UTC
Fixed on all active branches
Comment 13 Andrew Church 2013-12-11 06:08:53 UTC
FYI, the 4.7 branch no longer builds for me:

/tmp/gcc47-svn/gcc/config/i386/i386.c: In function 'ix86_expand_prologue':
/tmp/gcc47-svn/gcc/config/i386/i386.c:10434:4: warning: passing argument 1 of 'plus_constant' makes pointer from integer without a cast [enabled by default]
In file included from /tmp/gcc47-svn/gcc/config/i386/i386.c:26:0:
/tmp/gcc47-svn/gcc/rtl.h:1646:12: note: expected 'rtx' but argument is of type 'int'
/tmp/gcc47-svn/gcc/config/i386/i386.c:10434:4: warning: passing argument 2 of 'plus_constant' makes integer from pointer without a cast [enabled by default]
In file included from /tmp/gcc47-svn/gcc/config/i386/i386.c:26:0:
/tmp/gcc47-svn/gcc/rtl.h:1646:12: note: expected 'long int' but argument is of type 'rtx'
/tmp/gcc47-svn/gcc/config/i386/i386.c:10434:4: error: too many arguments to function 'plus_constant'
In file included from /tmp/gcc47-svn/gcc/config/i386/i386.c:26:0:
/tmp/gcc47-svn/gcc/rtl.h:1646:12: note: declared here

plus a couple more sets of the same errors for the other plus_constant() calls.

If I remove the "Pmode" argument, the resulting compiler seems to do the right thing and the test case passes.
Comment 14 Kai Tietz 2013-12-11 08:34:01 UTC
Sorry for that on the 4.7 branch.  plus_constant function takes one argument less on 4.7 branch.  Following patch fixes the issue for me:

Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c  (Revision 205882)
+++ config/i386/i386.c  (Arbeitskopie)
@@ -10431,15 +10431,15 @@ ix86_expand_prologue (void)

       if (r10_live && eax_live)
         {
-         t = plus_constant (Pmode, stack_pointer_rtx, allocate);
+         t = plus_constant (stack_pointer_rtx, allocate);
          emit_move_insn (r10, gen_frame_mem (Pmode, t));
-         t = plus_constant (Pmode, stack_pointer_rtx,
+         t = plus_constant (stack_pointer_rtx,
                             allocate - UNITS_PER_WORD);
          emit_move_insn (eax, gen_frame_mem (Pmode, t));
        }
       else if (eax_live || r10_live)
        {
-         t = plus_constant (Pmode, stack_pointer_rtx, allocate);
+         t = plus_constant (stack_pointer_rtx, allocate);
          emit_move_insn ((eax_live ? eax : r10), gen_frame_mem (Pmode, t));
        }
     }
Comment 15 Eric Botcazou 2013-12-11 08:51:50 UTC
> Sorry for that on the 4.7 branch.  plus_constant function takes one argument
> less on 4.7 branch.  Following patch fixes the issue for me:

Please test and install as obvious ASAP, all the x86/x86-64 ports are broken.
Comment 16 Kai Tietz 2013-12-11 14:05:58 UTC
Author: ktietz
Date: Wed Dec 11 14:05:56 2013
New Revision: 205895

URL: http://gcc.gnu.org/viewcvs?rev=205895&root=gcc&view=rev
Log:
Committed as obvious fix.

        PR target/56807
        * config/i386/i386.c (ix86_expand_prologue): plus_constant
        takes no mode-argument.


Modified:
    branches/gcc-4_7-branch/gcc/ChangeLog
    branches/gcc-4_7-branch/gcc/config/i386/i386.c
Comment 17 Andrew Church 2013-12-11 15:53:34 UTC
The 4.7 branch works fine for me now.  Thanks for the fixes.
Comment 18 Anton Mitrofanov 2013-12-16 19:05:26 UTC
This patch is ok for mingw32 target but may produce incorrect code for x86_64 linux target in case of saving/restoring both rax and r10. In that case during restoring of rax register (in "if (r10_live && eax_live)" path of http://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/i386.c?r1=205860&r2=205859&pathrev=205860) we will make move from incorrect address [rsp + allocate - UNITS_PER_WORD] while the saved value will be at address [rsp + allocate + UNITS_PER_WORD]. Here is possible code that can be generated (by looking at current gcc source code):

// suppose rsp == 1000 here
push rax // rsp == 992 ; [992] == rax
push r10 // rsp == 984 ; [984] == r10
mov rax, 400 // where 400 is allocate value
call allocate_stack
sub rsp, rax // rax == 400 so rsp == 584
mov r10, [rsp + 400] // 584 + 400 == 984 ; r10 = [984]
mov rax, [rsp + 400 - 8] // 584 + 400 - 8 == 976 ; rax = [976] <- WRONG

Instead the last instruction should be

mov rax, [rsp + 400 + 8] // 584 + 400 + 8 == 992 ; rax = [992]

Sorry, I can't write test case to trigger this code path so I will leave this comment here and not create new bug report (if you want you can move it to new bug report).
Comment 19 H.J. Lu 2013-12-16 19:20:22 UTC
(In reply to Anton Mitrofanov from comment #18)
> This patch is ok for mingw32 target but may produce incorrect code for
> x86_64 linux target in case of saving/restoring both rax and r10. In that
> case during restoring of rax register (in "if (r10_live && eax_live)" path
> of
> http://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/i386.
> c?r1=205860&r2=205859&pathrev=205860) we will make move from incorrect
> address [rsp + allocate - UNITS_PER_WORD] while the saved value will be at
> address [rsp + allocate + UNITS_PER_WORD]. Here is possible code that can be
> generated (by looking at current gcc source code):
> 
> // suppose rsp == 1000 here
> push rax // rsp == 992 ; [992] == rax
> push r10 // rsp == 984 ; [984] == r10
> mov rax, 400 // where 400 is allocate value
> call allocate_stack
> sub rsp, rax // rax == 400 so rsp == 584
> mov r10, [rsp + 400] // 584 + 400 == 984 ; r10 = [984]
> mov rax, [rsp + 400 - 8] // 584 + 400 - 8 == 976 ; rax = [976] <- WRONG
> 
> Instead the last instruction should be
> 
> mov rax, [rsp + 400 + 8] // 584 + 400 + 8 == 992 ; rax = [992]
> 

There are

      if (eax_live)
        {
          insn = emit_insn (gen_push (eax));
          allocate -= UNITS_PER_WORD;
...
      if (r10_live)
        {
          r10 = gen_rtx_REG (Pmode, R10_REG);
          insn = emit_insn (gen_push (r10));
          allocate -= UNITS_PER_WORD;
...
     if (r10_live && eax_live)
        {
          t = plus_constant (Pmode, stack_pointer_rtx, allocate);
          emit_move_insn (gen_rtx_REG (word_mode, R10_REG),
                          gen_frame_mem (word_mode, t));
          t = plus_constant (Pmode, stack_pointer_rtx,
                             allocate - UNITS_PER_WORD);
          emit_move_insn (gen_rtx_REG (word_mode, AX_REG),
                          gen_frame_mem (word_mode, t));
        }

They look OK to me.
Comment 20 Anton Mitrofanov 2013-12-16 19:31:17 UTC
I was talking about:

     if (r10_live && eax_live)
        {
          t = plus_constant (Pmode, stack_pointer_rtx, allocate);
          emit_move_insn (gen_rtx_REG (word_mode, R10_REG),
                          gen_frame_mem (word_mode, t));
          t = plus_constant (Pmode, stack_pointer_rtx,
                             allocate - UNITS_PER_WORD);
          emit_move_insn (gen_rtx_REG (word_mode, AX_REG),
                          gen_frame_mem (word_mode, t));
        }

And especially:
          t = plus_constant (Pmode, stack_pointer_rtx,
                             allocate - UNITS_PER_WORD);
Comment 21 Anton Mitrofanov 2013-12-16 19:32:15 UTC
It should be:
          t = plus_constant (Pmode, stack_pointer_rtx,
                             allocate + UNITS_PER_WORD);
Comment 22 H.J. Lu 2013-12-16 21:33:14 UTC
That piece of code is only trigged by -mstack-arg-probe, which is
specific to Windows:

  else if (!ix86_target_stack_probe ()
           || frame.stack_pointer_offset < CHECK_STACK_LIMIT)
    {
      pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
                                 GEN_INT (-allocate), -1,
                                 m->fs.cfa_reg == stack_pointer_rtx);
    }
  else  
    {
      rtx eax = gen_rtx_REG (Pmode, AX_REG);
      rtx r10 = NULL; 
      rtx (*adjust_stack_insn)(rtx, rtx, rtx); 
      const bool sp_is_cfa_reg = (m->fs.cfa_reg == stack_pointer_rtx);
      bool eax_live = false;
      bool r10_live = false;

      if (TARGET_64BIT)
        r10_live = (DECL_STATIC_CHAIN (current_function_decl) != 0);
      if (!TARGET_64BIT_MS_ABI)
        eax_live = ix86_eax_live_at_start_p ();

Is it possible to write a test with eax_live == true and r10_live == true?
Comment 23 Anton Mitrofanov 2013-12-16 22:04:01 UTC
>Is it possible to write a test with eax_live == true and r10_live == true?
I am really dunno. As I said I can't write sample which will trigger it (that is why it is only comment and not new bug report). But this code path exist so probably someone suggested that it can be triggered (otherwise what for to write it instead of some assert)?

>That piece of code is only trigged by -mstack-arg-probe, which is
specific to Windows
Can it really be used only for Windows target? I am not very good with gcc on other targets so don't know if it have any effect on linux or anything other.
Comment 24 Kai Tietz 2013-12-16 22:56:28 UTC
(In reply to Anton Mitrofanov from comment #23)
> >Is it possible to write a test with eax_live == true and r10_live == true?
> I am really dunno. As I said I can't write sample which will trigger it
> (that is why it is only comment and not new bug report). But this code path
> exist so probably someone suggested that it can be triggered (otherwise what
> for to write it instead of some assert)?
> 
> >That piece of code is only trigged by -mstack-arg-probe, which is specific to Windows

That isn't true.  -mstack-arg-probe is valid for all i386 (x86 and x86_64) targets.

> Can it really be used only for Windows target? I am not very good with gcc
> on other targets so don't know if it have any effect on linux or anything
> other.

Well, it can be used on other targets too.  Nevertheless you need the for this a implementation of the stack-probing routines (see here libgcc/config/i386 assembly-routine).

The interesting part is that this issue is just a x86_64 issue.  It is for sure no Windows issue, as for Windows targets eax_live and r10_live never can be both true.  The r10_live can just get true for x86_64, but not for x64 Windows, due drap-register isn't allowed for 64-bit Winodws targets.  The eax_live can't get true for 64-bit (Not quite sure if we need actually this limitation). Only for 32-bit it might be set.
For x86_64 targets (using amd64 ABI) it might be possible when drap-register is used together with -mstack-arg-probe (which is by default not supported).  So this case is in general hard to construct, and will be seen in real-live pretty unlikely.
Comment 25 Eric Botcazou 2014-02-14 09:30:34 UTC
The patch has introduced ICEs on x86-64, see PR target/60193.
Comment 26 Kai Tietz 2014-02-20 14:44:24 UTC
ICE was resolved for x86_64.