Bug 24419 - ix86 prologue clobbers memory when it shouldn't
Summary: ix86 prologue clobbers memory when it shouldn't
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.1.0
: P2 major
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-10-18 00:48 UTC by H.J. Lu
Modified: 2012-01-10 15:49 UTC (History)
3 users (show)

See Also:
Host:
Target: x86_64-unknown-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed: 2009-02-23 22:56:13


Attachments
A patch (2.05 KB, patch)
2005-10-18 19:38 UTC, H.J. Lu
Details | Diff
The current patch for gcc 4.4.0 revision 144367 (1.25 KB, patch)
2009-02-22 16:52 UTC, H.J. Lu
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2005-10-18 00:48:50 UTC
[hjl@gnu-9 prologue]$ cat bar.i
struct bar {
    short word;
    struct bar* next;
};
int foo (struct bar* c, int w, int delta)
{
  int i;
  if (c==((void *)0)) return w;
  i = foo (c->next, w, delta) + delta;
  c->word = i;
  return i;
}
[hjl@gnu-9 prologue]$ /usr/gcc-4.0/bin/gcc -S -O2  bar.i
[hjl@gnu-9 prologue]$ head -15 bar.s
        .file   "bar.i"
        .text
        .p2align 4,,15
.globl foo
        .type   foo, @function
foo:
.LFB2:
        movq    %rbx, -16(%rsp)
.LCFI0:
        movq    %rbp, -8(%rsp)
.LCFI1:
        subq    $16, %rsp
.LCFI2:
        testq   %rdi, %rdi
        movq    %rdi, %rbx

We are putting values beyond the end of the stack. It is wrong and
unsafe. We should adjust the stack first. This regression is introduced by

http://gcc.gnu.org/ml/gcc-patches/2003-03/msg01666.html
Comment 1 Andrew Pinski 2005-10-18 00:54:06 UTC
How is this unsafe, there is a redzone of 128bytes.  I don't see any pushl in this case.
Comment 2 Andrew Pinski 2005-10-18 00:55:41 UTC
        movq    %rbx, -16(%rsp)
.LCFI0:
        movq    %rbp, -8(%rsp)
.LCFI1:
        subq    $16, %rsp

This is valid as you don't push any thing.  Are you sure that the bug is not in your code?
Comment 3 H.J. Lu 2005-10-18 00:57:25 UTC
The issue is gcc may optimize

subq    $16, %rsp

to

pushq
pushq

later.
Comment 4 Andrew Pinski 2005-10-18 01:00:03 UTC
(In reply to comment #3)
> The issue is gcc may optimize

With what code, I don' see it.  So this is only a possible problem and only after your change and you don't show what your change is either (which is wrong).
Comment 5 H.J. Lu 2005-10-18 04:37:55 UTC
From bar.i.26.flow2:

(insn/f 51 11 52 0 (set (mem:DI (plus:DI (reg/f:DI 7 sp)
                (const_int -16 [0xfffffffffffffff0])) [0 S8 A8])
        (reg:DI 3 bx)) -1 (nil)
    (expr_list:REG_DEAD (reg:DI 3 bx)
        (nil)))

(insn/f 52 51 53 0 (set (mem:DI (plus:DI (reg/f:DI 7 sp)
                (const_int -8 [0xfffffffffffffff8])) [0 S8 A8])
        (reg:DI 6 bp)) -1 (nil)
    (expr_list:REG_DEAD (reg:DI 6 bp)
        (nil)))

(insn/f 53 52 54 0 (parallel [
            (set (reg/f:DI 7 sp)
                (plus:DI (reg/f:DI 7 sp)
                    (const_int -16 [0xfffffffffffffff0])))
            (clobber (reg:CC 17 flags))
            (clobber (mem:BLK (scratch) [0 A8]))
        ]) -1 (nil)
    (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))

Gcc doesn't tell the truth here since memory isn't/shouldn't be clobber
when adjusting SP.
Comment 6 Jan Hubicka 2005-10-18 17:22:36 UTC
Subject: Re:  [3.4/4.0/4.1 Regression]: ix86 prologue puts values beyond stack

> 
> 
> ------- Comment #5 from hjl at lucon dot org  2005-10-18 04:37 -------
> >From bar.i.26.flow2:
> 
> (insn/f 51 11 52 0 (set (mem:DI (plus:DI (reg/f:DI 7 sp)
>                 (const_int -16 [0xfffffffffffffff0])) [0 S8 A8])
>         (reg:DI 3 bx)) -1 (nil)
>     (expr_list:REG_DEAD (reg:DI 3 bx)
>         (nil)))
> 
> (insn/f 52 51 53 0 (set (mem:DI (plus:DI (reg/f:DI 7 sp)
>                 (const_int -8 [0xfffffffffffffff8])) [0 S8 A8])
>         (reg:DI 6 bp)) -1 (nil)
>     (expr_list:REG_DEAD (reg:DI 6 bp)
>         (nil)))
> 
> (insn/f 53 52 54 0 (parallel [
>             (set (reg/f:DI 7 sp)
>                 (plus:DI (reg/f:DI 7 sp)
>                     (const_int -16 [0xfffffffffffffff0])))
>             (clobber (reg:CC 17 flags))
>             (clobber (mem:BLK (scratch) [0 A8]))
>         ]) -1 (nil)
>     (expr_list:REG_UNUSED (reg:CC 17 flags)
>         (nil)))
> 
> Gcc doesn't tell the truth here since memory isn't/shouldn't be clobber
> when adjusting SP.

The transformation to push is probably valid only on x86 where any data
on opposite side of stack pointer is implicitly clobbered (for signal
handers stacks or whatever), so conversion of sub to push is safe.  This
is not very well modelled by RTL and ineed was problem few times in past
(where scheduler overactively moved the stuff resulting in kernel
crashes if I remember right)
On x86-64 we probably should conditionalize this and use only for
functions not using red zone or avoid the red zone usage for argument
area when we are going to produce push instead of sub (the reason
for that optimization is to reduce dependency chain that is more
important that push conversion for Athlon, not sure what is the case for
Nocona)

Honza
> 
> 
> -- 
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24419
> 
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
Comment 7 H.J. Lu 2005-10-18 17:33:51 UTC
We are working on ix86 optimization and run into this problem.
Comment 8 Jan Hubicka 2005-10-18 17:41:53 UTC
Subject: Re:  ix86 prologue clobbers memory when it shouldn't

> 
> 
> ------- Comment #7 from hjl at lucon dot org  2005-10-18 17:33 -------
> We are working on ix86 optimization and run into this problem.

The patch quoted on begining of thread should affect compilation with
-mred-zone enabled only and that one is off for ix86.  Can I see 32bit
testcase?  In general it is wrong on x86 to put data on unallocated
stack at first place...

Honza
> 
> 
> -- 
> 
> hjl at lucon dot org changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>             Summary|[3.4/4.0/4.1 Regression]:   |ix86 prologue clobbers
>                    |ix86 prologue puts values   |memory when it shouldn't
>                    |beyond stack                |
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24419
> 
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
Comment 9 H.J. Lu 2005-10-18 17:50:36 UTC
We only run into the problem with red zone enabled, which is x86-64. We have
2 issues:

1. When prologue uses mov, memory shouldn't be clobbered. But
ix86_expand_prologue calls pro_epilogue_adjust_stack which clobbers
memory.
2. We convert stack pointer subtractions to push even when memory isn't
clobbered with patterns like

;; Convert esp subtractions to push.
(define_peephole2
  [(match_scratch:SI 0 "r")
   (parallel [(set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (const_int
-4)))
              (clobber (reg:CC FLAGS_REG))])]
  "optimize_size || !TARGET_SUB_ESP_4"
  [(clobber (match_dup 0))
   (set (mem:SI (pre_dec:SI (reg:SI SP_REG))) (match_dup 0))])

I don't think we can do that since push will clobber memory.
Comment 10 Jan Hubicka 2005-10-18 18:17:25 UTC
Subject: Re:  ix86 prologue clobbers memory when it shouldn't

> 
> 
> ------- Comment #9 from hjl at lucon dot org  2005-10-18 17:50 -------
> We only run into the problem with red zone enabled, which is x86-64. We have
> 2 issues:
> 
> 1. When prologue uses mov, memory shouldn't be clobbered. But
> ix86_expand_prologue calls pro_epilogue_adjust_stack which clobbers
> memory.
> 2. We convert stack pointer subtractions to push even when memory isn't
> clobbered with patterns like
> 
> ;; Convert esp subtractions to push.
> (define_peephole2
>   [(match_scratch:SI 0 "r")
>    (parallel [(set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (const_int
> -4)))
>               (clobber (reg:CC FLAGS_REG))])]
>   "optimize_size || !TARGET_SUB_ESP_4"
>   [(clobber (match_dup 0))
>    (set (mem:SI (pre_dec:SI (reg:SI SP_REG))) (match_dup 0))])
> 
> I don't think we can do that since push will clobber memory.
Without red zone this is safe, but I see we do the conversion on 64bit
too that definitly is unsafe.  What about this patch?

Index: i386.md
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.md,v
retrieving revision 1.655
diff -c -3 -p -r1.655 i386.md
*** i386.md	24 Sep 2005 15:47:57 -0000	1.655
--- i386.md	18 Oct 2005 18:16:00 -0000
***************
*** 19306,19316 ****
  	      (clobber (mem:BLK (scratch)))])])
  
  ;; Convert esp subtractions to push.
  (define_peephole2
    [(match_scratch:SI 0 "r")
     (parallel [(set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (const_int -4)))
  	      (clobber (reg:CC FLAGS_REG))])]
!   "optimize_size || !TARGET_SUB_ESP_4"
    [(clobber (match_dup 0))
     (set (mem:SI (pre_dec:SI (reg:SI SP_REG))) (match_dup 0))])
  
--- 19306,19320 ----
  	      (clobber (mem:BLK (scratch)))])])
  
  ;; Convert esp subtractions to push.
+ ;; This conversion is safe only under assumption that unallocated stack is
+ ;; implicitly clobbered as specified by 32bit ABI (for signal handlers and such).
+ ;; This is not valid with red zone, but we can work harder and enable the
+ ;; optimization for functions that are not using it.
  (define_peephole2
    [(match_scratch:SI 0 "r")
     (parallel [(set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (const_int -4)))
  	      (clobber (reg:CC FLAGS_REG))])]
!   "(optimize_size || !TARGET_SUB_ESP_4) && !TARGET_RED_ZONE"
    [(clobber (match_dup 0))
     (set (mem:SI (pre_dec:SI (reg:SI SP_REG))) (match_dup 0))])
  
***************
*** 19318,19324 ****
    [(match_scratch:SI 0 "r")
     (parallel [(set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (const_int -8)))
  	      (clobber (reg:CC FLAGS_REG))])]
!   "optimize_size || !TARGET_SUB_ESP_8"
    [(clobber (match_dup 0))
     (set (mem:SI (pre_dec:SI (reg:SI SP_REG))) (match_dup 0))
     (set (mem:SI (pre_dec:SI (reg:SI SP_REG))) (match_dup 0))])
--- 19322,19328 ----
    [(match_scratch:SI 0 "r")
     (parallel [(set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (const_int -8)))
  	      (clobber (reg:CC FLAGS_REG))])]
!   "(optimize_size || !TARGET_SUB_ESP_8) && !TARGET_RED_ZONE"
    [(clobber (match_dup 0))
     (set (mem:SI (pre_dec:SI (reg:SI SP_REG))) (match_dup 0))
     (set (mem:SI (pre_dec:SI (reg:SI SP_REG))) (match_dup 0))])
***************
*** 19438,19448 ****
  	      (clobber (mem:BLK (scratch)))])])
  
  ;; Convert esp subtractions to push.
  (define_peephole2
    [(match_scratch:DI 0 "r")
     (parallel [(set (reg:DI SP_REG) (plus:DI (reg:DI SP_REG) (const_int -8)))
  	      (clobber (reg:CC FLAGS_REG))])]
!   "optimize_size || !TARGET_SUB_ESP_4"
    [(clobber (match_dup 0))
     (set (mem:DI (pre_dec:DI (reg:DI SP_REG))) (match_dup 0))])
  
--- 19442,19456 ----
  	      (clobber (mem:BLK (scratch)))])])
  
  ;; Convert esp subtractions to push.
+ ;; This conversion is safe only under assumption that unallocated stack is
+ ;; implicitly clobbered as specified by 32bit ABI (for signal handlers and such).
+ ;; This is not valid with red zone, but we can work harder and enable the
+ ;; optimization for functions that are not using it.
  (define_peephole2
    [(match_scratch:DI 0 "r")
     (parallel [(set (reg:DI SP_REG) (plus:DI (reg:DI SP_REG) (const_int -8)))
  	      (clobber (reg:CC FLAGS_REG))])]
!   "(optimize_size || !TARGET_SUB_ESP_4) && !TARGET_RED_ZONE"
    [(clobber (match_dup 0))
     (set (mem:DI (pre_dec:DI (reg:DI SP_REG))) (match_dup 0))])
  
***************
*** 19450,19456 ****
    [(match_scratch:DI 0 "r")
     (parallel [(set (reg:DI SP_REG) (plus:DI (reg:DI SP_REG) (const_int -16)))
  	      (clobber (reg:CC FLAGS_REG))])]
!   "optimize_size || !TARGET_SUB_ESP_8"
    [(clobber (match_dup 0))
     (set (mem:DI (pre_dec:DI (reg:DI SP_REG))) (match_dup 0))
     (set (mem:DI (pre_dec:DI (reg:DI SP_REG))) (match_dup 0))])
--- 19458,19464 ----
    [(match_scratch:DI 0 "r")
     (parallel [(set (reg:DI SP_REG) (plus:DI (reg:DI SP_REG) (const_int -16)))
  	      (clobber (reg:CC FLAGS_REG))])]
!   "(optimize_size || !TARGET_SUB_ESP_8) && !TARGET_RED_ZONE"
    [(clobber (match_dup 0))
     (set (mem:DI (pre_dec:DI (reg:DI SP_REG))) (match_dup 0))
     (set (mem:DI (pre_dec:DI (reg:DI SP_REG))) (match_dup 0))])
Comment 11 H.J. Lu 2005-10-18 18:23:27 UTC
The change looks reasonable. I will check it out. But I still don't
like pro_epilogue_adjust_stack doesn't tell the truth about if memory is
clobbered. Why not do something like

--- gcc/config/i386/i386.c.stack        2005-10-18 10:08:25.000000000 -0700
+++ gcc/config/i386/i386.c      2005-10-18 10:11:45.000000000 -0700
@@ -4323,22 +4323,40 @@ ix86_emit_save_regs_using_mov (rtx point
    otherwise.  */

 static void
-pro_epilogue_adjust_stack (rtx dest, rtx src, rtx offset, int style)
+pro_epilogue_adjust_stack (rtx dest, rtx src, rtx offset, int style,
+                          bool clobber_memory)
 {
   rtx insn;

   if (! TARGET_64BIT)
-    insn = emit_insn (gen_pro_epilogue_adjust_stack_1 (dest, src, offset));
+    {
+      if (clobber_memory)
+       insn = emit_insn (gen_pro_epilogue_adjust_stack_1 (dest, src,
+                                                          offset));
+      else
+       insn = emit_insn (gen_addsi3 (stack_pointer_rtx,
+                                     stack_pointer_rtx, offset));
+    }
   else if (x86_64_immediate_operand (offset, DImode))
-    insn = emit_insn (gen_pro_epilogue_adjust_stack_rex64 (dest, src, offset));+    {
+      if (clobber_memory)
+        insn = emit_insn (gen_pro_epilogue_adjust_stack_rex64 (dest,
+                                                              src,
+                                                              offset));
+      else
+       insn = emit_insn (gen_adddi3 (stack_pointer_rtx,
+                                     stack_pointer_rtx, offset));
+    }
   else
     {
       rtx r11;
       /* r11 is used by indirect sibcall return as well, set before the
         epilogue and used after the epilogue.  ATM indirect sibcall
         shouldn't be used together with huge frame sizes in one
-        function because of the frame_size check in sibcall.c.  */
-      if (style == 0)
+        function because of the frame_size check in sibcall.c. If
+        huge frame size is used, memory should always be clobbered
+        when stack is adjusted.  */
+      if (style == 0 || !clobber_memory)
        abort ();
       r11 = gen_rtx_REG (DImode, FIRST_REX_INT_REG + 3 /* R11 */);
       insn = emit_insn (gen_rtx_SET (DImode, r11, offset));
@@ -4360,6 +4378,7 @@ ix86_expand_prologue (void)
   bool pic_reg_used;
   struct ix86_frame frame;
   HOST_WIDE_INT allocate;
+  bool using_mov;

   ix86_compute_frame_layout (&frame);

@@ -4384,7 +4403,8 @@ ix86_expand_prologue (void)

   /* When using red zone we may start register saving before allocating
      the stack frame saving one cycle of the prologue.  */
-  if (TARGET_RED_ZONE && frame.save_regs_using_mov)
+  using_mov = TARGET_RED_ZONE && frame.save_regs_using_mov;
+  if (using_mov)
     ix86_emit_save_regs_using_mov (frame_pointer_needed ? hard_frame_pointer_rtx
                                   : stack_pointer_rtx,
                                   -frame.nregs * UNITS_PER_WORD);
@@ -4393,7 +4413,7 @@ ix86_expand_prologue (void)
     ;
   else if (! TARGET_STACK_PROBE || allocate < CHECK_STACK_LIMIT)
     pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
-                              GEN_INT (-allocate), -1);
+                              GEN_INT (-allocate), -1, !using_mov);
   else
     {
       /* Only valid for Win32.  */
@@ -4572,7 +4592,7 @@ ix86_expand_epilogue (int style)
              emit_move_insn (hard_frame_pointer_rtx, tmp);

              pro_epilogue_adjust_stack (stack_pointer_rtx, sa,
-                                        const0_rtx, style);
+                                        const0_rtx, style, true);
            }
          else
            {
@@ -4586,7 +4606,7 @@ ix86_expand_epilogue (int style)
        pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
                                   GEN_INT (frame.to_allocate
                                            + frame.nregs * UNITS_PER_WORD),
-                                  style);
+                                  style, true);
       /* If not an i386, mov & pop is faster than "leave".  */
       else if (TARGET_USE_LEAVE || optimize_size
               || !cfun->machine->use_fast_prologue_epilogue)
@@ -4595,7 +4615,7 @@ ix86_expand_epilogue (int style)
        {
          pro_epilogue_adjust_stack (stack_pointer_rtx,
                                     hard_frame_pointer_rtx,
-                                    const0_rtx, style);
+                                    const0_rtx, style, true);
          if (TARGET_64BIT)
            emit_insn (gen_popdi1 (hard_frame_pointer_rtx));
          else
@@ -4612,11 +4632,12 @@ ix86_expand_epilogue (int style)
            abort ();
          pro_epilogue_adjust_stack (stack_pointer_rtx,
                                     hard_frame_pointer_rtx,
-                                    GEN_INT (offset), style);
+                                    GEN_INT (offset), style, true);
        }
       else if (frame.to_allocate)
        pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
-                                  GEN_INT (frame.to_allocate), style);
+                                  GEN_INT (frame.to_allocate), style,
+                                  true);

       for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
        if (ix86_save_reg (regno, false))
Comment 12 H.J. Lu 2005-10-18 18:26:21 UTC
There is another issue when converting stack additions to pop:

(define_peephole2
  [(match_scratch:SI 0 "r")
   (parallel [(set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (const_int 4)))               (clobber (reg:CC FLAGS_REG))])]
  ""
  [(parallel [(set (match_dup 0) (mem:SI (reg:SI SP_REG)))
              (set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (const_int 4)))])]
  "")

Is that 100% safe with red zone? In theory, compiler can adjust stack, but not
clobber memory since there is a red zone.
Comment 13 Jan Hubicka 2005-10-18 18:29:43 UTC
Subject: Re:  ix86 prologue clobbers memory when it shouldn't

> 
> 
> ------- Comment #12 from hjl at lucon dot org  2005-10-18 18:26 -------
> There is another issue when converting stack additions to pop:
> 
> (define_peephole2
>   [(match_scratch:SI 0 "r")
>    (parallel [(set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (const_int 4)))    
>           (clobber (reg:CC FLAGS_REG))])]
>   ""
>   [(parallel [(set (match_dup 0) (mem:SI (reg:SI SP_REG)))
>               (set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (const_int 4)))])]
>   "")
> 
> Is that 100% safe with red zone? In theory, compiler can adjust stack, but not
> clobber memory since there is a red zone.

It is reading memory here, not writting so it should be safe...

Honza
> 
> 
> -- 
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24419
> 
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
Comment 14 H.J. Lu 2005-10-18 19:38:10 UTC
Created attachment 10023 [details]
A patch

This is the patch I am testing. It seems to fix my problem. I am running
the full bootstrap and check on Linux/i686 and Linux/x86-64 now.
Comment 15 H.J. Lu 2005-10-19 04:06:18 UTC
An updated patch is posted at

http://gcc.gnu.org/ml/gcc-patches/2005-10/msg01107.html
Comment 16 Andrew Pinski 2006-01-18 00:35:38 UTC
What seems the resolution of this bug, from what I gather from RTH, this is not a bug.
Comment 17 H.J. Lu 2006-01-18 00:58:28 UTC
An updated patch is posted at

http://gcc.gnu.org/ml/gcc-patches/2006-01/msg01061.html
Comment 18 Steven Bosscher 2009-02-22 16:38:31 UTC
Orphaned bug.
HJ?
Comment 19 H.J. Lu 2009-02-22 16:52:36 UTC
Created attachment 17343 [details]
The current patch for gcc 4.4.0 revision 144367
Comment 20 Richard Biener 2012-01-10 15:49:29 UTC
Invalid.