This is a bug report for gcc 4_3 branch. I will attach a test case, slightly reduced from zlib code. When compiling this test case for the x86_64-linux target with -O2 -fomit-frame-pointer, I see this at the start of the function: adler32: pushq %rbp movq %rdi, %rax andl $65535, %edi shrq $16, %rax movq %rsp, %rbp pushq %r15 andl $65535, %eax movl %edx, -140(%rbp) After %rsp was copied to %rbp, %r15 was pushed. So now %rsp is 8 bytes less than %rbp. The red zone is 128 bytes, so this means that any reference down to -136(%rbp) is valid. However, the code actually stores a value into -140(%rbp). This is invalid, and can cause subtle and unpredictable bugs depending upon when the kernel interrupts this code. This error happens because the scheduler moves this movl instruction ahead of some other pushq instructions. If all the pushq instructions happened first, the stack pointer would have been decremented sufficiently that the reference to -140(%rbp) would be safely within the red zone. 4.4 generates completely different, and seemingly better, code for this test case. However, I don't see anything in the 4.4 source code which would prevent this same bug from occurring given the right test case. So this is potentially a serious error for the 4.4 release as well, unless it has been fixed by some scheduling dependency that I haven't uncovered. However, I have not been able to find a test case.
Created attachment 17260 [details] Test case
(In reply to comment #0) > This is a bug report for gcc 4_3 branch. I will attach a test case, slightly > reduced from zlib code. When compiling this test case for the x86_64-linux > target with -O2 -fomit-frame-pointer, I see this at the start of the function: -fno-omit-frame-pointer > adler32: > pushq %rbp > movq %rdi, %rax > andl $65535, %edi > shrq $16, %rax > movq %rsp, %rbp > pushq %r15 > andl $65535, %eax > movl %edx, -140(%rbp) > > After %rsp was copied to %rbp, %r15 was pushed. So now %rsp is 8 bytes less > than %rbp. The red zone is 128 bytes, so this means that any reference down to > -136(%rbp) is valid. However, the code actually stores a value into > -140(%rbp). This is invalid, and can cause subtle and unpredictable bugs > depending upon when the kernel interrupts this code. We have several options here: 1. disable redzone with -fno-omit-frame-pointer 2. emit scheduling barrier just after the prologue for -fno-omit-frame-pointer 3. make instructions that access to redzone dependant on %rsp Confirmed, but since -fomit-frame-pointer is the default (and works OK) it is not a critical issue.
Yes, -fno-omit-frame-pointer, sorry. I don't see why this has anything to do with -fno-omit-frame-pointer per se. As far as I can see so far the same problem can arise with any function which happens to require a frame pointer for some reason, such as a call to __builtin_return_address or if profiling is turned on.
(In reply to comment #3) > Yes, -fno-omit-frame-pointer, sorry. > > I don't see why this has anything to do with -fno-omit-frame-pointer per se. > As far as I can see so far the same problem can arise with any function which > happens to require a frame pointer for some reason, such as a call to > __builtin_return_address or if profiling is turned on. -fno-omit-frame-pointer just forces usage of frame pointer in x86_64 case to trigger this problem. When frame pointer is used, there is no connection between %ebp and %esp at the point where prologue ends. The offset between %ebp and %esp is calculated at this point, but scheduler can still move instructions referring to %ebp all the way to (insn 567). (insn/f 567 566 568 2 pr39118.c:6 (set (reg/f:DI 6 bp) (reg/f:DI 7 sp)) -1 (nil)) (insn/f 568 567 569 2 pr39118.c:6 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0 S8 A8]) (reg:DI 44 r15)) -1 (nil)) ... (insn/f 572 571 573 2 pr39118.c:6 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0 S8 A8]) (reg:DI 3 bx)) -1 (nil)) (note 574 573 93 2 NOTE_INSN_PROLOGUE_END) (insn:HI 93 574 94 2 pr39118.c:6 (set (mem/c:DI (plus:DI (reg/f:DI 6 bp) (const_int -136 [0xffffffffffffff78])) [14 buf+0 S8 A8]) (reg:DI 4 si [ buf ])) 89 {*movdi_1_rex64} (nil)) (insn:HI 94 93 95 2 pr39118.c:6 (set (mem/c:SI (plus:DI (reg/f:DI 6 bp) (const_int -140 [0xffffffffffffff74])) [15 len+0 S4 A8]) (reg:SI 1 dx [ len ])) 47 {*movsi_1} (nil)) This problem can be solved in the most elegant way by inserting some kind of artificial instruction at the end of prologue, perhaps (set %rbp)(unspec [(%rsp)] UNSPEC_REDZONE_BLOCKAGE)
This patch fixes wrong scheduling: Index: i386.md =================================================================== --- i386.md (revision 143890) +++ i386.md (working copy) @@ -67,6 +67,7 @@ (UNSPEC_DEF_CFA 15) (UNSPEC_SET_RIP 16) (UNSPEC_SET_GOT_OFFSET 17) + (UNSPEC_REDZONE_BLOCKAGE 18) ; TLS support (UNSPEC_TP 18) @@ -14885,6 +14886,14 @@ "" [(set_attr "length" "0")]) +(define_insn "redzone_blockage" + [(set (match_operand 0 "" "") + (unspec [(match_operand 1 "" "")] UNSPEC_REDZONE_BLOCKAGE))] + "" + "" + [(set_attr "length" "0")]) + + ;; Insn emitted into the body of a function to return from a function. ;; This is only done if the function's epilogue is known to be simple. ;; See comments for ix86_can_use_return_insn_p in i386.c. Index: i386.c =================================================================== --- i386.c (revision 143890) +++ i386.c (working copy) @@ -6492,6 +6492,10 @@ emit_insn (gen_blockage ()); } + if (frame_pointer_needed && frame.red_zone_size) + emit_insn (gen_redzone_blockage (hard_frame_pointer_rtx, + stack_pointer_rtx)); + /* Emit cld instruction if stringops are used in the function. */ if (TARGET_CLD && ix86_current_function_needs_cld) emit_insn (gen_cld ());
Gcc 4.1 -fno-omit-frame-pointer -O2 generates --- adler32: pushq %rbp movq %rdi, %rax andl $65535, %edi shrq $16, %rax movq %rsp, %rbp pushq %r15 andl $65535, %eax pushq %r14 pushq %r13 pushq %r12 pushq %rbx subq $16, %rsp movl %edx, -148(%rbp) --- Is this a regression?
Yes, it's a regression for 4.3 relative to 4.2 for this test case. I don't know if it is a general regression.
We should plug this for 4.3 and 4.4, so due to comment 6 I think this is a regression.
Subject: Bug 39118 Author: uros Date: Tue Feb 10 16:12:33 2009 New Revision: 144063 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=144063 Log: PR target/39118 * config/i386/i386.c (expand_prologue): Emit blockage at the end of function prologue when frame pointer is used to access red zone area. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c
Subject: Bug 39118 Author: uros Date: Tue Feb 10 16:12:47 2009 New Revision: 144064 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=144064 Log: PR target/39118 * config/i386/i386.c (expand_prologue): Emit blockage at the end of function prologue when frame pointer is used to access red zone area. Modified: branches/gcc-4_3-branch/gcc/ChangeLog branches/gcc-4_3-branch/gcc/config/i386/i386.c
Fixed.
I didn't get around to commenting on the patch; I'll just note that it is conservative. We don't have to block every instruction, just those which use memory. Do we have to worry about the function epilogue? I don't see any reason why the scheduler would move any of the pop instructions ahead of something which references off of %rbp. But I also don't see anything which explicitly prevents that from happening.
Created attachment 17278 [details] Test case This test case is from Mark Heffernan. When compiling with -O2 -fno-omit-frame-pointer with gcc 4.3, it shows a red zone violation in the epilogue: movl -160(%rbp), %eax popq %r15 leave ret It does work with mainline, though.
I've verified that the problem in the epilogue occurs using the current 4.3 sources, so reopening the bug report. (There is no longer any problem in the prologue.)
(In reply to comment #12) > I didn't get around to commenting on the patch; I'll just note that it is > conservative. We don't have to block every instruction, just those which use > memory. True, but as described in the gcc-patches@ mailing list, this situation should not happen often. OTOTH, if there is some standard way to insert memory blockage no-ops, we can use these instead of unspec_volatile. > Do we have to worry about the function epilogue? I don't see any reason why > the scheduler would move any of the pop instructions ahead of something which > references off of %rbp. But I also don't see anything which explicitly > prevents that from happening. We should also add blockage there.
(In reply to comment #13) > This test case is from Mark Heffernan. When compiling with -O2 > -fno-omit-frame-pointer with gcc 4.3, it shows a red zone violation in the > epilogue: > > movl -160(%rbp), %eax > popq %r15 > leave > ret This does not show redzone violation, but following does: popq %rbx popq %r12 popq %r13 popq %r14 movl -160(%rbp), %eax popq %r15 leave ret Anyway, I have a patch.
Created attachment 17280 [details] Patch This patch inserts _memory_ blockages at the end of function prologue and at the beginning of function epilogue.
Created attachment 17281 [details] gcc44-pr39118.patch Patch I've bootstrapped/regtested on x86_64-linux.
Subject: Bug 39118 Author: uros Date: Wed Feb 11 11:43:24 2009 New Revision: 144100 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=144100 Log: PR target/39118 * config/i386/i386.md (UNSPEC_MEMORY_BLOCKAGE): New constant. (memory_blockage): New expander. (*memory_blockage): New insn pattern. * config/i386/i386.c (ix86_expand_prologue): Use memory_blockage instead of general blockage at the end of function prologue when frame pointer is used to access red zone area. Do not emit blockage when profiling, it is emitted in generic code. (ix86_expand_epilogue): Emit memory_blockage at the beginning of function epilogue when frame pointer is used to access red zone area. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/i386.md
Subject: Bug 39118 Author: uros Date: Wed Feb 11 11:53:47 2009 New Revision: 144101 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=144101 Log: PR target/39118 * config/i386/i386.md (UNSPEC_MEMORY_BLOCKAGE): New constant. (memory_blockage): New expander. (*memory_blockage): New insn pattern. * config/i386/i386.c (ix86_expand_prologue): Use memory_blockage instead of general blockage at the end of function prologue when frame pointer is used to access red zone area. Do not emit blockage when profiling, it is emitted in generic code. (ix86_expand_epilogue): Emit memory_blockage at the beginning of function epilogue when frame pointer is used to access red zone area. Modified: branches/gcc-4_3-branch/gcc/ChangeLog branches/gcc-4_3-branch/gcc/config/i386/i386.c branches/gcc-4_3-branch/gcc/config/i386/i386.md
Fixed again.
Thanks.
This may have caused bootstrap memory requirements for insn-recog to go up significantly enough to cause cc1: out of memory allocating 4064 bytes after a total of 660041728 bytes make[3]: *** [insn-recog.o] Error 1 for Andreas.
(In reply to comment #23) > This may have caused bootstrap memory requirements for insn-recog to go up > significantly enough to cause > > cc1: out of memory allocating 4064 bytes after a total of 660041728 bytes > make[3]: *** [insn-recog.o] Error 1 > FWIW, I have no problems with gcc 4.4 revision 144236 on Linux/ia32 with 1GB RAM/2GB swap as well as Linux/Intel64 with 4GB RAM/2GB swap.
Subject: Re: [4.3/4.4 Regression] x86_64 red zone violation On Tue, 17 Feb 2009, hjl dot tools at gmail dot com wrote: > ------- Comment #24 from hjl dot tools at gmail dot com 2009-02-17 18:45 ------- > (In reply to comment #23) > > This may have caused bootstrap memory requirements for insn-recog to go up > > significantly enough to cause > > > > cc1: out of memory allocating 4064 bytes after a total of 660041728 bytes > > make[3]: *** [insn-recog.o] Error 1 > > > > FWIW, I have no problems with gcc 4.4 revision 144236 on Linux/ia32 with > 1GB RAM/2GB swap as well as Linux/Intel64 with 4GB RAM/2GB swap. The above bails out at ~650MB, so I guess his machine has way less than 1GB ram. Richard.
Subject: Re: [4.3/4.4 Regression] x86_64 red zone violation On Tue, 17 Feb 2009, Richard Guenther wrote: > On Tue, 17 Feb 2009, hjl dot tools at gmail dot com wrote: > > > ------- Comment #24 from hjl dot tools at gmail dot com 2009-02-17 18:45 ------- > > (In reply to comment #23) > > > This may have caused bootstrap memory requirements for insn-recog to go up > > > significantly enough to cause > > > > > > cc1: out of memory allocating 4064 bytes after a total of 660041728 bytes > > > make[3]: *** [insn-recog.o] Error 1 > > > > > > > FWIW, I have no problems with gcc 4.4 revision 144236 on Linux/ia32 with > > 1GB RAM/2GB swap as well as Linux/Intel64 with 4GB RAM/2GB swap. > > The above bails out at ~650MB, so I guess his machine has way less than > 1GB ram. I can reproduce it with ulimit -v 500000 and --enable-checking=yes,rtl --disable-multilib --enable-languages=c. Richard.