On Linux/ia32, revision 175272: http://gcc.gnu.org/ml/gcc-cvs/2011-06/msg00763.html miscompiled 447.dealII in SPEC CPU 2006 with -m32 -O2 -msse2 -mfpmath=sse -ffast-math 447.dealII: copy 0 non-zero return code (exit code=11, signal=0) **************************************** Contents of dealII.err **************************************** /bin/sh: line 1: 11020 Segmentation fault (core dumped) taskset -c $TASK_CP U ../run_base_ref_lnx32-gcc.0000/dealII_base.lnx32-gcc 23 **************************************** Starting program: /export/regression/rrs/spec/2006/spec/benchspec/CPU2006/447.dealII/run/run_base_ref_lnx32-gcc.0000/dealII_base.lnx32-gcc 23 Refinement cycle: 0 Program received signal SIGSEGV, Segmentation fault. 0x08218006 in SparsityPattern::operator()(unsigned int, unsigned int) const () Missing separate debuginfos, use: debuginfo-install glibc-2.14-3.3.f15.i686 libgcc-4.6.0-9.fc15.i686 libstdc++-4.6.0-9.fc15.i686 (gdb) bt #0 0x08218006 in SparsityPattern::operator()(unsigned int, unsigned int) const () #1 0x0821c931 in LaplaceSolver::Solver<3>::assemble_matrix(LaplaceSolver::Solver<3>::LinearSystem&, TriaActiveIterator<3, DoFCellAccessor<3> > const&, TriaActiveIterator<3, DoFCellAccessor<3> > const&, Threads::DummyThreadMutex&) const () #2 0x0821fe7e in LaplaceSolver::Solver<3>::assemble_linear_system(LaplaceSolver::Solver<3>::LinearSystem&) () #3 0x08222a32 in LaplaceSolver::Solver<3>::solve_problem() () #4 0x0821fa3d in LaplaceSolver::WeightedResidual<3>::solve_problem() () #5 0x0821b92e in Framework<3>::run(Framework<3>::ProblemDescription const&) () #6 0x0804bf6d in main () (gdb)
Revert this patch: diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 4d2caa8..2716f78 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -10246,7 +10246,7 @@ tsubst_arg_types (tree arg_types, /* Do array-to-pointer, function-to-pointer conversion, and ignore top-level qualifiers as required. */ - type = TYPE_MAIN_VARIANT (type_decays_to (type)); + type = cv_unqualified (type_decays_to (type)); /* We do not substitute into default arguments here. The standard mandates that they be instantiated only when needed, which is fixes the crash.
step-14.cc is miscompiled: #0 SparsityPattern::operator() (this=0xcef8ffff, i=0, j=0) at sparsity_pattern.cc:608 #1 0x08223fc1 in add (value=0.075579727185634243, j=<optimized out>, i=<optimized out>, this=0xffffc76a) at include/lac/sparse_matrix.h:1709 #2 LaplaceSolver::Solver<3>::assemble_matrix (this=0x83845c8, linear_system=..., begin_cell=..., end_cell=..., mutex=...) at step-14.cc:796 #3 0x0822750e in operator() (arg4=..., arg3=..., arg2=..., arg1=..., this=0xffffc738) at include/base/thread_management.h:5382 #4 LaplaceSolver::Solver<3>::assemble_linear_system (this=0x83845c8, linear_system=...) at step-14.cc:716 #5 0x0822a0c2 in LaplaceSolver::Solver<3>::solve_problem (this=0x83845c8) at step-14.cc:676 #6 0x082270cd in do_call<void (LaplaceSolver::WeightedResidual<3>::*)(), LaplaceSolver::WeightedResidual<3>, boost::tuples::tuple<> > (obj=..., fun_ptr=<optimized out>) at include/fe/fe_q.h:239
Compiling step-14.cc with -O -fipa-sra -foptimize-sibling-calls is enough to produce the crash. I think this is likely to be an IPA-SRA bug; the front end change in question is correct. The new code is doing the same transformation that we already do in grokparms.
I've dived into the problem yesterday. Seems the problem is connected with tail call optimization. The refined difference is below. Assembler is extracted from step-14.cc Tail call optimization converts this code: .cfi_startproc pushl %ebx .cfi_def_cfa_offset 8 .cfi_offset 3, -8 subl $40, %esp .cfi_def_cfa_offset 48 movl 52(%esp), %eax movl 56(%esp), %ecx movl 60(%esp), %ebx movl %eax, %edx testb $1, %al je .L1498 movl (%ebx,%ecx), %edx movl -1(%edx,%eax), %edx .L1498: movl 76(%esp), %eax movl %eax, 16(%esp) movl 72(%esp), %eax movl %eax, 12(%esp) movl 68(%esp), %eax movl %eax, 8(%esp) movl 64(%esp), %eax movl %eax, 4(%esp) addl %ebx, %ecx movl %ecx, (%esp) call *%edx addl $40, %esp .cfi_def_cfa_offset 8 popl %ebx .cfi_def_cfa_offset 4 .cfi_restore 3 ret To the following tail call optimized .cfi_startproc subl $8, %esp .cfi_def_cfa_offset 12 movl %ebx, (%esp) movl %esi, 4(%esp) movl 16(%esp), %eax movl 20(%esp), %ecx movl 24(%esp), %ebx .cfi_offset 6, -8 .cfi_offset 3, -12 movl %eax, %edx testb $1, %al je .L1498 movl (%ebx,%ecx), %edx movl -1(%edx,%eax), %edx .L1498: movl 40(%esp), %eax movl %eax, 28(%esp) movl 36(%esp), %esi movl %esi, 24(%esp) movl 32(%esp), %esi movl %esi, 20(%esp) movl %eax, 16(%esp) addl %ebx, %ecx movl %ecx, 12(%esp) movl (%esp), %ebx movl 4(%esp), %esi addl $8, %esp .cfi_def_cfa_offset 4 .cfi_restore 6 .cfi_restore 3 jmp *%edx I've prepared to assemblers of step-14 with the only difference mentioned above. dealII compiled with first snippet works just fine, while tail-optimized case gives SegFault I believe the problem is that stack adjustment is wrong here. Continuing looking into
Problem here is that GCC incorrectly stores arguments to stack in case of tail-call opt. Here is snippet movl 40(%esp), %eax movl %eax, 28(%esp) movl 36(%esp), %esi movl %esi, 24(%esp) movl 32(%esp), %esi movl %esi, 20(%esp) movl %eax, 16(%esp) Argument from 28(%esp) is not copied to 28(%esp) at all. Correct sequence must be (semantically) like that: movl 40(%esp), %esi ; <----- Use esi to move memory movl 28(%esp), %eax ; <----- Save overlapping value movl %esi, 28(%esp) movl 36(%esp), %esi movl %esi, 24(%esp) movl 32(%esp), %esi movl %esi, 20(%esp) movl %eax, 16(%esp) ; <----- Store saved value Working toward the patch
I've looked into tail-call opt. Seems we need not call it at all if we have new/old stack addresses for parameters overlap. BTW, I think it is to conservative, anyway... We have call to pointer and passing of 5 params. Last param is out of our interest, but first 4 do. We have in expand: GIMPLE snippet: D.172468_17 = MEM[(struct cons &)&arg_refs + 12].head; D.172469_18 = MEM[(struct cons &)&arg_refs + 8].head; D.172470_19 = MEM[(struct cons &)&arg_refs + 4].head; D.172471_20 = MEM[(struct cons &)&arg_refs]; D.172462_21 = (sizetype) fun_ptr$__delta_26; D.172463_22 = obj_3(D) + D.172462_21; fun_ptr$__pfn_23 (D.172463_22, D.172471_20, D.172470_19, D.172469_18, D.172468_17); [tail call] And subsequently expanding it we have RTL: (insn 19 18 20 4 (set (reg/f:SI 80) (mem/s/f/j/c:SI (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 28 [0x1c])) [0 MEM[(struct cons &)&arg_refs + 12].head+0 S4 A32])) include/base/thread_management.h:1534 -1 (nil)) (insn 20 19 21 4 (set (mem:SI (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 16 [0x10])) [0 S4 A32]) (reg/f:SI 80)) include/base/thread_management.h:1534 -1 (nil)) (insn 21 20 22 4 (set (reg/f:SI 81) (mem/s/f/j/c:SI (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 24 [0x18])) [0 MEM[(struct cons &)&arg_refs + 8].head+0 S4 A32])) include/base/thread_management.h:1534 -1 (nil)) (insn 22 21 23 4 (set (mem:SI (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 12 [0xc])) [0 S4 A32]) (reg/f:SI 81)) include/base/thread_management.h:1534 -1 (nil)) (insn 23 22 24 4 (set (reg/f:SI 82) (mem/s/f/j/c:SI (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 20 [0x14])) [0 MEM[(struct cons &)&arg_refs + 4].head+0 S4 A32])) include/base/thread_management.h:1534 -1 (nil)) (insn 24 23 25 4 (set (mem:SI (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 8 [0x8])) [0 S4 A32]) (reg/f:SI 82)) include/base/thread_management.h:1534 -1 (nil)) (insn 25 24 26 4 (parallel [ (set (reg:SI 83) (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 16 [0x10]))) (clobber (reg:CC 17 flags)) ]) step-14.cc:4271 -1 (nil)) (insn 26 25 27 4 (set (reg/f:SI 84) <---- (mem/f/c:SI (reg:SI 83) [0 MEM[(struct cons &)&arg_refs]+0 S4 A32])) include/base/thread_management.h:1534 -1 <---- (nil)) (insn 27 26 28 4 (set (mem:SI (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 4 [0x4])) [0 S4 A32]) (reg/f:SI 84)) include/base/thread_management.h:1534 -1 (nil)) (insn 28 27 29 4 (parallel [ (set (reg:SI 85) (plus:SI (reg/v/f:SI 77 [ obj ]) (reg:SI 74 [ fun_ptr$__delta ]))) (clobber (reg:CC 17 flags)) ]) include/base/thread_management.h:1534 -1 (nil)) (insn 29 28 30 4 (set (mem:SI (reg/f:SI 53 virtual-incoming-args) [0 S4 A32]) (reg:SI 85)) include/base/thread_management.h:1534 -1 (nil)) (call_insn/j 30 29 31 4 (call (mem:QI (reg/f:SI 59 [ fun_ptr$__pfn ]) [0 *fun_ptr$__pfn_23 S1 A8]) (const_int 20 [0x14])) include/base/thread_management.h:1534 -1 (nil) (expr_list:REG_DEP_TRUE (use (mem/f/i:SI (reg/f:SI 53 virtual-incoming-args) [0 S4 A32])) (expr_list:REG_DEP_TRUE (use (mem/f/i:SI (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 4 [0x4])) [0 S4 A32])) (expr_list:REG_DEP_TRUE (use (mem/f/i:SI (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 8 [0x8])) [0 S4 A32])) (expr_list:REG_DEP_TRUE (use (mem/f/i:SI (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 12 [0xc])) [0 S4 A32])) (expr_list:REG_DEP_TRUE (use (mem/f/i:SI (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 16 [0x10])) [0 S4 A32])) (nil))))))) You can see that calculation of address of 4-th param is performed in different way. We calculate a sum, store it to register, load memory from that address and the put it on the new stack. BUT. Predicate which check for memory overlapping looks like this: static bool mem_overlaps_already_clobbered_arg_p (rtx addr, unsigned HOST_WIDE_INT size) { HOST_WIDE_INT i; if (addr == crtl->args.internal_arg_pointer) i = 0; else if (GET_CODE (addr) == PLUS && XEXP (addr, 0) == crtl->args.internal_arg_pointer && CONST_INT_P (XEXP (addr, 1))) i = INTVAL (XEXP (addr, 1)); /* Return true for arg pointer based indexed addressing. */ else if (GET_CODE (addr) == PLUS && (XEXP (addr, 0) == crtl->args.internal_arg_pointer || XEXP (addr, 1) == crtl->args.internal_arg_pointer)) return true; else return false; ..... You can see that if we have load which does not look like (esp+*), routine always states that there is no overlap. That is why tail-call applied, while he mustn't.
Expanding arguments in different ways occurs because corresponding GIMPLE statements are of different types. For 'good' case we have expression of type COMPONENT_REF While for 'bad' one it is just a MEM_REF For that different kinds we have slightly different expanding. The different expression types comes from front-end, at least in einline phase accesses are different: [include/boost/tuple/detail/tuple_basic.hpp : 130:14] D.167199_17 = MEM[(struct cons &)arg_list_2(D) + 12].head; [include/boost/tuple/detail/tuple_basic.hpp : 130:14] D.167198_18 = MEM[(struct cons &)arg_list_2(D) + 8].head; [include/boost/tuple/detail/tuple_basic.hpp : 130:14] D.167197_19 = MEM[(struct cons &)arg_list_2(D) + 4].head; [step-14.cc : 4271:1] D.167196_20 = MEM[(struct cons &)arg_list_2(D)]; [include/base/thread_management.h : 1534:13] D.167205_21 = (sizetype) fun_ptr$__delta_7; [include/base/thread_management.h : 1534:13] D.167204_22 = obj_1(D) + D.167205_21; [include/base/thread_management.h : 1534:13] iftmp.53_23 (D.167204_22, D.167196_20, D.167197_19, D.167198_18, D.167199_17); [include/base/thread_management.h : 1826:5] return; Having all that said I believe that the issue somewhat connected to fron-end generation. Jason, could you prompt me something? Your patch changes a line which has a comment: /* Do array-to-pointer, function-to-pointer conversion, and ignore top-level qualifiers as required. */ ...
If someone really need a quick fix, it may be done like this: gcc/expor.s: static bool mem_overlaps_already_clobbered_arg_p (rtx addr, unsigned HOST_WIDE_INT size) { HOST_WIDE_INT i; if (addr == crtl->args.internal_arg_pointer) i = 0; else if (GET_CODE (addr) == PLUS && XEXP (addr, 0) == crtl->args.internal_arg_pointer && CONST_INT_P (XEXP (addr, 1))) i = INTVAL (XEXP (addr, 1)); /* Return true for arg pointer based indexed addressing. */ else if (GET_CODE (addr) == PLUS && (XEXP (addr, 0) == crtl->args.internal_arg_pointer || XEXP (addr, 1) == crtl->args.internal_arg_pointer)) else if (GET_CODE(addr) == REG) <----------- return true; else return false; But we possibly will be to conservative doing so
One more point for FE guys. Function definition have no difference between 4 args. Here it is include/base/thread_management.h: template <typename PFun, typename C, typename ArgList> static inline void do_call (PFun fun_ptr, C &obj, ArgList &arg_list, internal::return_value<RT> &ret_val, const int2type<4> &) { ret_val.set ((obj.*fun_ptr) (arg_list.template get<0>(), arg_list.template get<1>(), arg_list.template get<2>(), arg_list.template get<3>())); }
*** Bug 49639 has been marked as a duplicate of this bug. ***
(In reply to comment #7) > Expanding arguments in different ways occurs because corresponding GIMPLE > statements are of different types. > For 'good' case we have expression of type > COMPONENT_REF > > While for 'bad' one it is just a > MEM_REF > > For that different kinds we have slightly different expanding. > > The different expression types comes from front-end, at least in einline phase > accesses are different: The front end doesn't create MEM_REFs, they are all produced in the middle end. If tail call optimizations have trouble with them, that sounds like the bug. I'm unable to reproduce your work; when I compile step-14.cc I never hit mem_overlaps_already_clobbered_arg_p while compiling do_call. And reverting my change to tsubst_arg_types doesn't affect the use of MEM_REF in do_call. What optimization options are you using? I'm using -O -fipa-sra -foptimize-sibling-calls, as mentioned.
If it happens that the MEM_REF case (supposedly with the adjusted base pointer) does not work while the non-MEM_REF case works then this looks like a latent problem in the tailcall code to me. I see nothing wrong with the GIMPLE itself. Eventually we can adjust code-gen for the MEM_REF case to match that of the pure component-ref one, but that would again just paper over the issue.
I agree, that there is no problem with GIMPLE. As I mentioned we may just forbid tailcall opt for non-MEMREFS, but I suspect it will lead to significant perf. degradation. BTW, I am to extract simple testcase by now
Created attachment 24700 [details] Reduced testcase
The testcase reproduces the problem for me on trunk already with -m32 -O -foptimize-sibling-calls, so it doesn't seem to be IPA SRA related.
Yes. This is because expander prepares arguments like this: ... (insn 6 5 7 2 (parallel [ (set (reg:SI 64) (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 4 [0x4]))) (clobber (reg:CC 17 flags)) ]) 2.cc:103 -1 (nil)) (insn 7 6 8 2 (set (reg:SI 65) (mem/c:SI (plus:SI (reg:SI 64) (const_int 12 [0xc])) [0 MEM[(int &)&t + 12]+0 S4 A32])) 2.cc:103 -1 (nil)) (insn 8 7 9 2 (set (mem:SI (plus:SI (reg/f:SI 53 virtual-incoming-args) (const_int 12 [0xc])) [0 S4 A32]) (reg:SI 65)) 2.cc:103 -1 (nil)) ... So, calls.c/mem_overlaps_already_clobbered_arg_p unable to determine, that data come from stack, so it returns true and enables tailcall.
I suspect that the tailcalling code does not expect TER to happen when expanding the call arguments? With -fno-tree-ter the issue also isn't appearant.
Hm, the argument setup for the tailcall clobbers the incoming stack argument. And without TER we do not overlap argument setup with the loads from the incoming stack argument.
Created attachment 24701 [details] Patch to make tailcall check more conservative Attached patch adds another check for clobbered stack area. If address comes from a register - we have no idea about destination address. That means we must act in conservative way - address possibly overlaps with stack area of interest.
With patch attached both tescase and 447.dealII passing
On Wed, 6 Jul 2011, kirill.yukhin at intel dot com wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49519 > > --- Comment #19 from Yukhin Kirill <kirill.yukhin at intel dot com> 2011-07-06 11:49:34 UTC --- > Created attachment 24701 [details] > --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24701 > Patch to make tailcall check more conservative > > Attached patch adds another check for clobbered stack area. > If address comes from a register - we have no idea about destination address. > That means we must act in conservative way - address possibly overlaps with > stack area of interest. That looks reasonable. Can you bootstrap & test this fix and post it to gcc-patches?
(In reply to comment #21) > On Wed, 6 Jul 2011, kirill.yukhin at intel dot com wrote: > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49519 > > > > --- Comment #19 from Yukhin Kirill <kirill.yukhin at intel dot com> 2011-07-06 11:49:34 UTC --- > > Created attachment 24701 [details] > > --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24701 > > Patch to make tailcall check more conservative > > > > Attached patch adds another check for clobbered stack area. > > If address comes from a register - we have no idea about destination address. > > That means we must act in conservative way - address possibly overlaps with > > stack area of interest. > > That looks reasonable. Can you bootstrap & test this fix and post it to > gcc-patches? Already in progress :)
A patch is posted at http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00359.html
Author: hjl Date: Fri Jul 8 13:12:03 2011 New Revision: 176042 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=176042 Log: Fix PR middle-end/49519. gcc/ 2011-07-08 Kirill Yukhin <kirill.yukhin@intel.com> PR middle-end/49519 * calls.c (mem_overlaps_already_clobbered_arg_p): Additional check if address is stored in register. If so - give up. (check_sibcall_argument_overlap_1): Do not perform check of overlapping when it is call to address. gcc/tessuite/ 2011-07-08 Kirill Yukhin <kirill.yukhin@intel.com> PR middle-end/49519 * g++.dg/torture/pr49519.C: New test. Added: trunk/gcc/testsuite/g++.dg/torture/pr49519.C Modified: trunk/gcc/ChangeLog trunk/gcc/calls.c trunk/gcc/testsuite/ChangeLog
Test gcc.target/arm/sibcall-1.c starts failing with r176042 for arm-none-linux-gnueabi configured with defaults for arch and cpu. The failure can be reproduced with a simple cross cc1.
Fixed. For the missed tailcall opt please open a separate bugreport.
This patch seems too conservative and it appears that it will cause the compiler to miss all tailcalls with pointer arguments. It only matters if the register points to the incoming parameters, which can only happen in unusual cases. We should be able to determine that reliably. Janis, did you open a PR for the missing optimization? I didn't see one.
*** Bug 49709 has been marked as a duplicate of this bug. ***
On Tue, 12 Jul 2011, ian at airs dot com wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49519 > > Ian Lance Taylor <ian at airs dot com> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |ian at airs dot com > > --- Comment #27 from Ian Lance Taylor <ian at airs dot com> 2011-07-12 04:37:01 UTC --- > This patch seems too conservative and it appears that it will cause the > compiler to miss all tailcalls with pointer arguments. It only matters if the > register points to the incoming parameters, which can only happen in unusual > cases. We should be able to determine that reliably. Janis, did you open a PR > for the missing optimization? I didn't see one. The point is, of course, we have to determine it conservatively correct. The patch makes sure we do that. Richard.
> This patch seems too conservative and it appears that it will cause the > compiler to miss all tailcalls with pointer arguments. It only matters if the > register points to the incoming parameters, which can only happen in unusual > cases. We should be able to determine that reliably. This is a bit of an overstatement though, as extern int foo (int, int *); int i; int bar (int a, int b) { int *p = &i; return foo (a, p); } will be caught. Even this extern int foo (int, int); int i; int bar (int a, int b) { int *p = &i; return foo (a, *p); } will be caught thanks to alias analysis at the Tree level. As for the "reliably", this time it's a little optimistic IMO. :-) But we can probably be less conservative with more work.
I filed PR middle-end/49719 for the sibcall-1.c failure.
Can we check 2 cases: 1. Indirect function call via a register (PR 50074). 2. Non of function arguments are pointers (PR 49179).
There is also the case where no argument has been stored on the stack yet. That's quite common for ABIs that pass arguments in registers.