[hjl@gnu-9 prologue]$ cat bar.i struct bar { short word; struct bar* next; }; int foo (struct bar* c, int w, int delta) { int i; if (c==((void *)0)) return w; i = foo (c->next, w, delta) + delta; c->word = i; return i; } [hjl@gnu-9 prologue]$ /usr/gcc-4.0/bin/gcc -S -O2 bar.i [hjl@gnu-9 prologue]$ head -15 bar.s .file "bar.i" .text .p2align 4,,15 .globl foo .type foo, @function foo: .LFB2: movq %rbx, -16(%rsp) .LCFI0: movq %rbp, -8(%rsp) .LCFI1: subq $16, %rsp .LCFI2: testq %rdi, %rdi movq %rdi, %rbx We are putting values beyond the end of the stack. It is wrong and unsafe. We should adjust the stack first. This regression is introduced by http://gcc.gnu.org/ml/gcc-patches/2003-03/msg01666.html
How is this unsafe, there is a redzone of 128bytes. I don't see any pushl in this case.
movq %rbx, -16(%rsp) .LCFI0: movq %rbp, -8(%rsp) .LCFI1: subq $16, %rsp This is valid as you don't push any thing. Are you sure that the bug is not in your code?
The issue is gcc may optimize subq $16, %rsp to pushq pushq later.
(In reply to comment #3) > The issue is gcc may optimize With what code, I don' see it. So this is only a possible problem and only after your change and you don't show what your change is either (which is wrong).
From bar.i.26.flow2: (insn/f 51 11 52 0 (set (mem:DI (plus:DI (reg/f:DI 7 sp) (const_int -16 [0xfffffffffffffff0])) [0 S8 A8]) (reg:DI 3 bx)) -1 (nil) (expr_list:REG_DEAD (reg:DI 3 bx) (nil))) (insn/f 52 51 53 0 (set (mem:DI (plus:DI (reg/f:DI 7 sp) (const_int -8 [0xfffffffffffffff8])) [0 S8 A8]) (reg:DI 6 bp)) -1 (nil) (expr_list:REG_DEAD (reg:DI 6 bp) (nil))) (insn/f 53 52 54 0 (parallel [ (set (reg/f:DI 7 sp) (plus:DI (reg/f:DI 7 sp) (const_int -16 [0xfffffffffffffff0]))) (clobber (reg:CC 17 flags)) (clobber (mem:BLK (scratch) [0 A8])) ]) -1 (nil) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) Gcc doesn't tell the truth here since memory isn't/shouldn't be clobber when adjusting SP.
Subject: Re: [3.4/4.0/4.1 Regression]: ix86 prologue puts values beyond stack > > > ------- Comment #5 from hjl at lucon dot org 2005-10-18 04:37 ------- > >From bar.i.26.flow2: > > (insn/f 51 11 52 0 (set (mem:DI (plus:DI (reg/f:DI 7 sp) > (const_int -16 [0xfffffffffffffff0])) [0 S8 A8]) > (reg:DI 3 bx)) -1 (nil) > (expr_list:REG_DEAD (reg:DI 3 bx) > (nil))) > > (insn/f 52 51 53 0 (set (mem:DI (plus:DI (reg/f:DI 7 sp) > (const_int -8 [0xfffffffffffffff8])) [0 S8 A8]) > (reg:DI 6 bp)) -1 (nil) > (expr_list:REG_DEAD (reg:DI 6 bp) > (nil))) > > (insn/f 53 52 54 0 (parallel [ > (set (reg/f:DI 7 sp) > (plus:DI (reg/f:DI 7 sp) > (const_int -16 [0xfffffffffffffff0]))) > (clobber (reg:CC 17 flags)) > (clobber (mem:BLK (scratch) [0 A8])) > ]) -1 (nil) > (expr_list:REG_UNUSED (reg:CC 17 flags) > (nil))) > > Gcc doesn't tell the truth here since memory isn't/shouldn't be clobber > when adjusting SP. The transformation to push is probably valid only on x86 where any data on opposite side of stack pointer is implicitly clobbered (for signal handers stacks or whatever), so conversion of sub to push is safe. This is not very well modelled by RTL and ineed was problem few times in past (where scheduler overactively moved the stuff resulting in kernel crashes if I remember right) On x86-64 we probably should conditionalize this and use only for functions not using red zone or avoid the red zone usage for argument area when we are going to produce push instead of sub (the reason for that optimization is to reduce dependency chain that is more important that push conversion for Athlon, not sure what is the case for Nocona) Honza > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24419 > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is.
We are working on ix86 optimization and run into this problem.
Subject: Re: ix86 prologue clobbers memory when it shouldn't > > > ------- Comment #7 from hjl at lucon dot org 2005-10-18 17:33 ------- > We are working on ix86 optimization and run into this problem. The patch quoted on begining of thread should affect compilation with -mred-zone enabled only and that one is off for ix86. Can I see 32bit testcase? In general it is wrong on x86 to put data on unallocated stack at first place... Honza > > > -- > > hjl at lucon dot org changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Summary|[3.4/4.0/4.1 Regression]: |ix86 prologue clobbers > |ix86 prologue puts values |memory when it shouldn't > |beyond stack | > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24419 > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is.
We only run into the problem with red zone enabled, which is x86-64. We have 2 issues: 1. When prologue uses mov, memory shouldn't be clobbered. But ix86_expand_prologue calls pro_epilogue_adjust_stack which clobbers memory. 2. We convert stack pointer subtractions to push even when memory isn't clobbered with patterns like ;; Convert esp subtractions to push. (define_peephole2 [(match_scratch:SI 0 "r") (parallel [(set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (const_int -4))) (clobber (reg:CC FLAGS_REG))])] "optimize_size || !TARGET_SUB_ESP_4" [(clobber (match_dup 0)) (set (mem:SI (pre_dec:SI (reg:SI SP_REG))) (match_dup 0))]) I don't think we can do that since push will clobber memory.
Subject: Re: ix86 prologue clobbers memory when it shouldn't > > > ------- Comment #9 from hjl at lucon dot org 2005-10-18 17:50 ------- > We only run into the problem with red zone enabled, which is x86-64. We have > 2 issues: > > 1. When prologue uses mov, memory shouldn't be clobbered. But > ix86_expand_prologue calls pro_epilogue_adjust_stack which clobbers > memory. > 2. We convert stack pointer subtractions to push even when memory isn't > clobbered with patterns like > > ;; Convert esp subtractions to push. > (define_peephole2 > [(match_scratch:SI 0 "r") > (parallel [(set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (const_int > -4))) > (clobber (reg:CC FLAGS_REG))])] > "optimize_size || !TARGET_SUB_ESP_4" > [(clobber (match_dup 0)) > (set (mem:SI (pre_dec:SI (reg:SI SP_REG))) (match_dup 0))]) > > I don't think we can do that since push will clobber memory. Without red zone this is safe, but I see we do the conversion on 64bit too that definitly is unsafe. What about this patch? Index: i386.md =================================================================== RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.md,v retrieving revision 1.655 diff -c -3 -p -r1.655 i386.md *** i386.md 24 Sep 2005 15:47:57 -0000 1.655 --- i386.md 18 Oct 2005 18:16:00 -0000 *************** *** 19306,19316 **** (clobber (mem:BLK (scratch)))])]) ;; Convert esp subtractions to push. (define_peephole2 [(match_scratch:SI 0 "r") (parallel [(set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (const_int -4))) (clobber (reg:CC FLAGS_REG))])] ! "optimize_size || !TARGET_SUB_ESP_4" [(clobber (match_dup 0)) (set (mem:SI (pre_dec:SI (reg:SI SP_REG))) (match_dup 0))]) --- 19306,19320 ---- (clobber (mem:BLK (scratch)))])]) ;; Convert esp subtractions to push. + ;; This conversion is safe only under assumption that unallocated stack is + ;; implicitly clobbered as specified by 32bit ABI (for signal handlers and such). + ;; This is not valid with red zone, but we can work harder and enable the + ;; optimization for functions that are not using it. (define_peephole2 [(match_scratch:SI 0 "r") (parallel [(set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (const_int -4))) (clobber (reg:CC FLAGS_REG))])] ! "(optimize_size || !TARGET_SUB_ESP_4) && !TARGET_RED_ZONE" [(clobber (match_dup 0)) (set (mem:SI (pre_dec:SI (reg:SI SP_REG))) (match_dup 0))]) *************** *** 19318,19324 **** [(match_scratch:SI 0 "r") (parallel [(set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (const_int -8))) (clobber (reg:CC FLAGS_REG))])] ! "optimize_size || !TARGET_SUB_ESP_8" [(clobber (match_dup 0)) (set (mem:SI (pre_dec:SI (reg:SI SP_REG))) (match_dup 0)) (set (mem:SI (pre_dec:SI (reg:SI SP_REG))) (match_dup 0))]) --- 19322,19328 ---- [(match_scratch:SI 0 "r") (parallel [(set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (const_int -8))) (clobber (reg:CC FLAGS_REG))])] ! "(optimize_size || !TARGET_SUB_ESP_8) && !TARGET_RED_ZONE" [(clobber (match_dup 0)) (set (mem:SI (pre_dec:SI (reg:SI SP_REG))) (match_dup 0)) (set (mem:SI (pre_dec:SI (reg:SI SP_REG))) (match_dup 0))]) *************** *** 19438,19448 **** (clobber (mem:BLK (scratch)))])]) ;; Convert esp subtractions to push. (define_peephole2 [(match_scratch:DI 0 "r") (parallel [(set (reg:DI SP_REG) (plus:DI (reg:DI SP_REG) (const_int -8))) (clobber (reg:CC FLAGS_REG))])] ! "optimize_size || !TARGET_SUB_ESP_4" [(clobber (match_dup 0)) (set (mem:DI (pre_dec:DI (reg:DI SP_REG))) (match_dup 0))]) --- 19442,19456 ---- (clobber (mem:BLK (scratch)))])]) ;; Convert esp subtractions to push. + ;; This conversion is safe only under assumption that unallocated stack is + ;; implicitly clobbered as specified by 32bit ABI (for signal handlers and such). + ;; This is not valid with red zone, but we can work harder and enable the + ;; optimization for functions that are not using it. (define_peephole2 [(match_scratch:DI 0 "r") (parallel [(set (reg:DI SP_REG) (plus:DI (reg:DI SP_REG) (const_int -8))) (clobber (reg:CC FLAGS_REG))])] ! "(optimize_size || !TARGET_SUB_ESP_4) && !TARGET_RED_ZONE" [(clobber (match_dup 0)) (set (mem:DI (pre_dec:DI (reg:DI SP_REG))) (match_dup 0))]) *************** *** 19450,19456 **** [(match_scratch:DI 0 "r") (parallel [(set (reg:DI SP_REG) (plus:DI (reg:DI SP_REG) (const_int -16))) (clobber (reg:CC FLAGS_REG))])] ! "optimize_size || !TARGET_SUB_ESP_8" [(clobber (match_dup 0)) (set (mem:DI (pre_dec:DI (reg:DI SP_REG))) (match_dup 0)) (set (mem:DI (pre_dec:DI (reg:DI SP_REG))) (match_dup 0))]) --- 19458,19464 ---- [(match_scratch:DI 0 "r") (parallel [(set (reg:DI SP_REG) (plus:DI (reg:DI SP_REG) (const_int -16))) (clobber (reg:CC FLAGS_REG))])] ! "(optimize_size || !TARGET_SUB_ESP_8) && !TARGET_RED_ZONE" [(clobber (match_dup 0)) (set (mem:DI (pre_dec:DI (reg:DI SP_REG))) (match_dup 0)) (set (mem:DI (pre_dec:DI (reg:DI SP_REG))) (match_dup 0))])
The change looks reasonable. I will check it out. But I still don't like pro_epilogue_adjust_stack doesn't tell the truth about if memory is clobbered. Why not do something like --- gcc/config/i386/i386.c.stack 2005-10-18 10:08:25.000000000 -0700 +++ gcc/config/i386/i386.c 2005-10-18 10:11:45.000000000 -0700 @@ -4323,22 +4323,40 @@ ix86_emit_save_regs_using_mov (rtx point otherwise. */ static void -pro_epilogue_adjust_stack (rtx dest, rtx src, rtx offset, int style) +pro_epilogue_adjust_stack (rtx dest, rtx src, rtx offset, int style, + bool clobber_memory) { rtx insn; if (! TARGET_64BIT) - insn = emit_insn (gen_pro_epilogue_adjust_stack_1 (dest, src, offset)); + { + if (clobber_memory) + insn = emit_insn (gen_pro_epilogue_adjust_stack_1 (dest, src, + offset)); + else + insn = emit_insn (gen_addsi3 (stack_pointer_rtx, + stack_pointer_rtx, offset)); + } else if (x86_64_immediate_operand (offset, DImode)) - insn = emit_insn (gen_pro_epilogue_adjust_stack_rex64 (dest, src, offset));+ { + if (clobber_memory) + insn = emit_insn (gen_pro_epilogue_adjust_stack_rex64 (dest, + src, + offset)); + else + insn = emit_insn (gen_adddi3 (stack_pointer_rtx, + stack_pointer_rtx, offset)); + } else { rtx r11; /* r11 is used by indirect sibcall return as well, set before the epilogue and used after the epilogue. ATM indirect sibcall shouldn't be used together with huge frame sizes in one - function because of the frame_size check in sibcall.c. */ - if (style == 0) + function because of the frame_size check in sibcall.c. If + huge frame size is used, memory should always be clobbered + when stack is adjusted. */ + if (style == 0 || !clobber_memory) abort (); r11 = gen_rtx_REG (DImode, FIRST_REX_INT_REG + 3 /* R11 */); insn = emit_insn (gen_rtx_SET (DImode, r11, offset)); @@ -4360,6 +4378,7 @@ ix86_expand_prologue (void) bool pic_reg_used; struct ix86_frame frame; HOST_WIDE_INT allocate; + bool using_mov; ix86_compute_frame_layout (&frame); @@ -4384,7 +4403,8 @@ ix86_expand_prologue (void) /* When using red zone we may start register saving before allocating the stack frame saving one cycle of the prologue. */ - if (TARGET_RED_ZONE && frame.save_regs_using_mov) + using_mov = TARGET_RED_ZONE && frame.save_regs_using_mov; + if (using_mov) ix86_emit_save_regs_using_mov (frame_pointer_needed ? hard_frame_pointer_rtx : stack_pointer_rtx, -frame.nregs * UNITS_PER_WORD); @@ -4393,7 +4413,7 @@ ix86_expand_prologue (void) ; else if (! TARGET_STACK_PROBE || allocate < CHECK_STACK_LIMIT) pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, - GEN_INT (-allocate), -1); + GEN_INT (-allocate), -1, !using_mov); else { /* Only valid for Win32. */ @@ -4572,7 +4592,7 @@ ix86_expand_epilogue (int style) emit_move_insn (hard_frame_pointer_rtx, tmp); pro_epilogue_adjust_stack (stack_pointer_rtx, sa, - const0_rtx, style); + const0_rtx, style, true); } else { @@ -4586,7 +4606,7 @@ ix86_expand_epilogue (int style) pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, GEN_INT (frame.to_allocate + frame.nregs * UNITS_PER_WORD), - style); + style, true); /* If not an i386, mov & pop is faster than "leave". */ else if (TARGET_USE_LEAVE || optimize_size || !cfun->machine->use_fast_prologue_epilogue) @@ -4595,7 +4615,7 @@ ix86_expand_epilogue (int style) { pro_epilogue_adjust_stack (stack_pointer_rtx, hard_frame_pointer_rtx, - const0_rtx, style); + const0_rtx, style, true); if (TARGET_64BIT) emit_insn (gen_popdi1 (hard_frame_pointer_rtx)); else @@ -4612,11 +4632,12 @@ ix86_expand_epilogue (int style) abort (); pro_epilogue_adjust_stack (stack_pointer_rtx, hard_frame_pointer_rtx, - GEN_INT (offset), style); + GEN_INT (offset), style, true); } else if (frame.to_allocate) pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, - GEN_INT (frame.to_allocate), style); + GEN_INT (frame.to_allocate), style, + true); for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) if (ix86_save_reg (regno, false))
There is another issue when converting stack additions to pop: (define_peephole2 [(match_scratch:SI 0 "r") (parallel [(set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (const_int 4))) (clobber (reg:CC FLAGS_REG))])] "" [(parallel [(set (match_dup 0) (mem:SI (reg:SI SP_REG))) (set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (const_int 4)))])] "") Is that 100% safe with red zone? In theory, compiler can adjust stack, but not clobber memory since there is a red zone.
Subject: Re: ix86 prologue clobbers memory when it shouldn't > > > ------- Comment #12 from hjl at lucon dot org 2005-10-18 18:26 ------- > There is another issue when converting stack additions to pop: > > (define_peephole2 > [(match_scratch:SI 0 "r") > (parallel [(set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (const_int 4))) > (clobber (reg:CC FLAGS_REG))])] > "" > [(parallel [(set (match_dup 0) (mem:SI (reg:SI SP_REG))) > (set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (const_int 4)))])] > "") > > Is that 100% safe with red zone? In theory, compiler can adjust stack, but not > clobber memory since there is a red zone. It is reading memory here, not writting so it should be safe... Honza > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24419 > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is.
Created attachment 10023 [details] A patch This is the patch I am testing. It seems to fix my problem. I am running the full bootstrap and check on Linux/i686 and Linux/x86-64 now.
An updated patch is posted at http://gcc.gnu.org/ml/gcc-patches/2005-10/msg01107.html
What seems the resolution of this bug, from what I gather from RTH, this is not a bug.
An updated patch is posted at http://gcc.gnu.org/ml/gcc-patches/2006-01/msg01061.html
Orphaned bug. HJ?
Created attachment 17343 [details] The current patch for gcc 4.4.0 revision 144367
Invalid.