49519 – [4.7 Regression] Revision 175272 miscompiled 447.dealII in SPEC CPU 2006

Bug 49519 - [4.7 Regression] Revision 175272 miscompiled 447.dealII in SPEC CPU 2006

Summary: [4.7 Regression] Revision 175272 miscompiled 447.dealII in SPEC CPU 2006

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	middle-end (show other bugs)
Version:	4.7.0

Importance:	P3 normal
Target Milestone:	4.7.0
Assignee:	Not yet assigned to anyone

URL:	http://gcc.gnu.org/ml/gcc-patches/201...
Keywords:	wrong-code

Duplicates (2):	49639 49709 (view as bug list)
Depends on:
Blocks:	49719 50074
	Show dependency tree / graph

Reported:	2011-06-24 00:40 UTC by H.J. Lu
Modified:	2011-11-19 11:30 UTC (History)
CC List:	8 users (show)

See Also:
Host:
Target:	i?86--
Build:
Known to work:
Known to fail:
Last reconfirmed:	2011-06-25 03:09:56

Attachments
Reduced testcase (713 bytes, text/plain) 2011-07-06 10:25 UTC, Yukhin Kirill	Details
Patch to make tailcall check more conservative (263 bytes, patch) 2011-07-06 11:49 UTC, Yukhin Kirill	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description H.J. Lu 2011-06-24 00:40:15 UTC

On Linux/ia32, revision 175272:

http://gcc.gnu.org/ml/gcc-cvs/2011-06/msg00763.html

miscompiled 447.dealII in SPEC CPU 2006 with

-m32  -O2 -msse2 -mfpmath=sse -ffast-math

447.dealII: copy 0 non-zero return code (exit code=11, signal=0)


****************************************
Contents of dealII.err
****************************************
/bin/sh: line 1: 11020 Segmentation fault      (core dumped) taskset -c $TASK_CP
U ../run_base_ref_lnx32-gcc.0000/dealII_base.lnx32-gcc 23

****************************************

Starting program: /export/regression/rrs/spec/2006/spec/benchspec/CPU2006/447.dealII/run/run_base_ref_lnx32-gcc.0000/dealII_base.lnx32-gcc 23
Refinement cycle: 0

Program received signal SIGSEGV, Segmentation fault.
0x08218006 in SparsityPattern::operator()(unsigned int, unsigned int) const ()
Missing separate debuginfos, use: debuginfo-install glibc-2.14-3.3.f15.i686 libgcc-4.6.0-9.fc15.i686 libstdc++-4.6.0-9.fc15.i686
(gdb) bt
#0  0x08218006 in SparsityPattern::operator()(unsigned int, unsigned int) const
    ()
#1  0x0821c931 in LaplaceSolver::Solver<3>::assemble_matrix(LaplaceSolver::Solver<3>::LinearSystem&, TriaActiveIterator<3, DoFCellAccessor<3> > const&, TriaActiveIterator<3, DoFCellAccessor<3> > const&, Threads::DummyThreadMutex&) const ()
#2  0x0821fe7e in LaplaceSolver::Solver<3>::assemble_linear_system(LaplaceSolver::Solver<3>::LinearSystem&) ()
#3  0x08222a32 in LaplaceSolver::Solver<3>::solve_problem() ()
#4  0x0821fa3d in LaplaceSolver::WeightedResidual<3>::solve_problem() ()
#5  0x0821b92e in Framework<3>::run(Framework<3>::ProblemDescription const&) ()
#6  0x0804bf6d in main ()
(gdb)

Comment 1 H.J. Lu 2011-06-24 04:41:09 UTC

Revert this patch:

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 4d2caa8..2716f78 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -10246,7 +10246,7 @@ tsubst_arg_types (tree arg_types,

    /* Do array-to-pointer, function-to-pointer conversion, and ignore
       top-level qualifiers as required.  */
-    type = TYPE_MAIN_VARIANT (type_decays_to (type));
+    type = cv_unqualified (type_decays_to (type));

    /* We do not substitute into default arguments here.  The standard
       mandates that they be instantiated only when needed, which is

fixes the crash.

Comment 2 H.J. Lu 2011-06-24 16:37:20 UTC

step-14.cc is miscompiled:

#0  SparsityPattern::operator() (this=0xcef8ffff, i=0, j=0)
    at sparsity_pattern.cc:608
#1  0x08223fc1 in add (value=0.075579727185634243, j=<optimized out>, 
    i=<optimized out>, this=0xffffc76a) at include/lac/sparse_matrix.h:1709
#2  LaplaceSolver::Solver<3>::assemble_matrix (this=0x83845c8, 
    linear_system=..., begin_cell=..., end_cell=..., mutex=...)
    at step-14.cc:796
#3  0x0822750e in operator() (arg4=..., arg3=..., arg2=..., arg1=..., 
    this=0xffffc738) at include/base/thread_management.h:5382
#4  LaplaceSolver::Solver<3>::assemble_linear_system (this=0x83845c8, 
    linear_system=...) at step-14.cc:716
#5  0x0822a0c2 in LaplaceSolver::Solver<3>::solve_problem (this=0x83845c8)
    at step-14.cc:676
#6  0x082270cd in do_call<void (LaplaceSolver::WeightedResidual<3>::*)(), LaplaceSolver::WeightedResidual<3>, boost::tuples::tuple<> > (obj=..., 
    fun_ptr=<optimized out>) at include/fe/fe_q.h:239

Comment 3 Jason Merrill 2011-06-25 03:09:56 UTC

Compiling step-14.cc with -O -fipa-sra -foptimize-sibling-calls is enough to produce the crash.  I think this is likely to be an IPA-SRA bug; the front end change in question is correct.  The new code is doing the same transformation that we already do in grokparms.

Comment 4 Yukhin Kirill 2011-06-29 05:06:04 UTC

I've dived into the problem yesterday.
Seems the problem is connected with tail call optimization.
The refined difference is below. Assembler is extracted from step-14.cc

Tail call optimization converts this code:

        .cfi_startproc
        pushl   %ebx
        .cfi_def_cfa_offset 8
        .cfi_offset 3, -8
        subl    $40, %esp
        .cfi_def_cfa_offset 48
        movl    52(%esp), %eax
        movl    56(%esp), %ecx
        movl    60(%esp), %ebx
        movl    %eax, %edx
        testb   $1, %al
        je      .L1498
        movl    (%ebx,%ecx), %edx
        movl    -1(%edx,%eax), %edx
.L1498:
        movl    76(%esp), %eax
        movl    %eax, 16(%esp)
        movl    72(%esp), %eax
        movl    %eax, 12(%esp)
        movl    68(%esp), %eax
        movl    %eax, 8(%esp)
        movl    64(%esp), %eax
        movl    %eax, 4(%esp)
        addl    %ebx, %ecx
        movl    %ecx, (%esp)
        call    *%edx
        addl    $40, %esp
        .cfi_def_cfa_offset 8
        popl    %ebx
        .cfi_def_cfa_offset 4
        .cfi_restore 3
        ret

To the following tail call optimized
        .cfi_startproc
        subl    $8, %esp
        .cfi_def_cfa_offset 12
        movl    %ebx, (%esp)
        movl    %esi, 4(%esp)
        movl    16(%esp), %eax
        movl    20(%esp), %ecx
        movl    24(%esp), %ebx
        .cfi_offset 6, -8
        .cfi_offset 3, -12
        movl    %eax, %edx
        testb   $1, %al
        je      .L1498
        movl    (%ebx,%ecx), %edx
        movl    -1(%edx,%eax), %edx
.L1498:
        movl    40(%esp), %eax
        movl    %eax, 28(%esp)
        movl    36(%esp), %esi
        movl    %esi, 24(%esp)
        movl    32(%esp), %esi
        movl    %esi, 20(%esp)
        movl    %eax, 16(%esp)
        addl    %ebx, %ecx
        movl    %ecx, 12(%esp)
        movl    (%esp), %ebx
        movl    4(%esp), %esi
        addl    $8, %esp
        .cfi_def_cfa_offset 4
        .cfi_restore 6
        .cfi_restore 3
        jmp     *%edx

I've prepared to assemblers of step-14 with the only difference mentioned above.
dealII compiled with first snippet works just fine, while tail-optimized case gives SegFault

I believe the problem is that stack adjustment is wrong here. 
Continuing looking into

Comment 5 Yukhin Kirill 2011-06-29 12:24:25 UTC

Problem here is that GCC incorrectly stores arguments to stack in case of tail-call opt.
Here is snippet
        movl    40(%esp), %eax
        movl    %eax, 28(%esp)
        movl    36(%esp), %esi
        movl    %esi, 24(%esp)
        movl    32(%esp), %esi
        movl    %esi, 20(%esp)
        movl    %eax, 16(%esp)

Argument from 28(%esp) is not copied to 28(%esp) at all.
Correct sequence must be (semantically) like that:
        movl    40(%esp), %esi ; <----- Use esi to move memory
        movl    28(%esp), %eax ; <----- Save overlapping value
        movl    %esi, 28(%esp)
        movl    36(%esp), %esi
        movl    %esi, 24(%esp)
        movl    32(%esp), %esi
        movl    %esi, 20(%esp)
        movl    %eax, 16(%esp) ; <----- Store saved value

Working toward the patch

Comment 6 Yukhin Kirill 2011-06-30 15:11:41 UTC

I've looked into tail-call opt. Seems we need not call it at all if we have new/old stack addresses for parameters overlap. BTW, I think it is to conservative, anyway...
We have call to pointer and passing of 5 params. Last param is out of our interest, but first 4 do. 
We have in expand:
  GIMPLE snippet:
    D.172468_17 = MEM[(struct cons &)&arg_refs + 12].head;
    D.172469_18 = MEM[(struct cons &)&arg_refs + 8].head;
    D.172470_19 = MEM[(struct cons &)&arg_refs + 4].head;
    D.172471_20 = MEM[(struct cons &)&arg_refs];
    D.172462_21 = (sizetype) fun_ptr$__delta_26;
    D.172463_22 = obj_3(D) + D.172462_21;
    fun_ptr$__pfn_23 (D.172463_22, D.172471_20, D.172470_19, D.172469_18, D.172468_17); [tail call]

And subsequently expanding it we have RTL:
(insn 19 18 20 4 (set (reg/f:SI 80)
        (mem/s/f/j/c:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
                (const_int 28 [0x1c])) [0 MEM[(struct cons &)&arg_refs + 12].head+0 S4 A32])) include/base/thread_management.h:1534 -1
     (nil))

(insn 20 19 21 4 (set (mem:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
                (const_int 16 [0x10])) [0 S4 A32])
        (reg/f:SI 80)) include/base/thread_management.h:1534 -1
     (nil))

(insn 21 20 22 4 (set (reg/f:SI 81)
        (mem/s/f/j/c:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
                (const_int 24 [0x18])) [0 MEM[(struct cons &)&arg_refs + 8].head+0 S4 A32])) include/base/thread_management.h:1534 -1
     (nil))

(insn 22 21 23 4 (set (mem:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
                (const_int 12 [0xc])) [0 S4 A32])
        (reg/f:SI 81)) include/base/thread_management.h:1534 -1
     (nil))

(insn 23 22 24 4 (set (reg/f:SI 82)
        (mem/s/f/j/c:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
                (const_int 20 [0x14])) [0 MEM[(struct cons &)&arg_refs + 4].head+0 S4 A32])) include/base/thread_management.h:1534 -1
     (nil))

(insn 24 23 25 4 (set (mem:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
                (const_int 8 [0x8])) [0 S4 A32])
        (reg/f:SI 82)) include/base/thread_management.h:1534 -1
     (nil))

(insn 25 24 26 4 (parallel [
            (set (reg:SI 83)
                (plus:SI (reg/f:SI 53 virtual-incoming-args)
                    (const_int 16 [0x10])))
            (clobber (reg:CC 17 flags))
        ]) step-14.cc:4271 -1
     (nil))

(insn 26 25 27 4 (set (reg/f:SI 84)   <----
        (mem/f/c:SI (reg:SI 83) [0 MEM[(struct cons &)&arg_refs]+0 S4 A32])) include/base/thread_management.h:1534 -1 <----
     (nil))

(insn 27 26 28 4 (set (mem:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
                (const_int 4 [0x4])) [0 S4 A32])
        (reg/f:SI 84)) include/base/thread_management.h:1534 -1
     (nil))

(insn 28 27 29 4 (parallel [
            (set (reg:SI 85)
                (plus:SI (reg/v/f:SI 77 [ obj ])
                    (reg:SI 74 [ fun_ptr$__delta ])))
            (clobber (reg:CC 17 flags))
        ]) include/base/thread_management.h:1534 -1
     (nil))

(insn 29 28 30 4 (set (mem:SI (reg/f:SI 53 virtual-incoming-args) [0 S4 A32])
        (reg:SI 85)) include/base/thread_management.h:1534 -1
     (nil))

(call_insn/j 30 29 31 4 (call (mem:QI (reg/f:SI 59 [ fun_ptr$__pfn ]) [0 *fun_ptr$__pfn_23 S1 A8])
        (const_int 20 [0x14])) include/base/thread_management.h:1534 -1
     (nil)
    (expr_list:REG_DEP_TRUE (use (mem/f/i:SI (reg/f:SI 53 virtual-incoming-args) [0 S4 A32]))
        (expr_list:REG_DEP_TRUE (use (mem/f/i:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
                        (const_int 4 [0x4])) [0 S4 A32]))
            (expr_list:REG_DEP_TRUE (use (mem/f/i:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
                            (const_int 8 [0x8])) [0 S4 A32]))
                (expr_list:REG_DEP_TRUE (use (mem/f/i:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
                                (const_int 12 [0xc])) [0 S4 A32]))
                    (expr_list:REG_DEP_TRUE (use (mem/f/i:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
                                    (const_int 16 [0x10])) [0 S4 A32]))
                        (nil)))))))


You can see that calculation of address of 4-th param is performed in different way. We calculate a sum, store it to register, load memory from that address and the put it on the new stack.

BUT. Predicate which check for memory overlapping looks like this:
 static bool
  mem_overlaps_already_clobbered_arg_p (rtx addr, unsigned HOST_WIDE_INT size)
  {
    HOST_WIDE_INT i;

    if (addr == crtl->args.internal_arg_pointer)
      i = 0;
    else if (GET_CODE (addr) == PLUS
             && XEXP (addr, 0) == crtl->args.internal_arg_pointer
             && CONST_INT_P (XEXP (addr, 1)))
      i = INTVAL (XEXP (addr, 1));
    /* Return true for arg pointer based indexed addressing.  */
    else if (GET_CODE (addr) == PLUS
             && (XEXP (addr, 0) == crtl->args.internal_arg_pointer
                 || XEXP (addr, 1) == crtl->args.internal_arg_pointer))
      return true;
    else
      return false;
.....

You can see that if we have load which does not look like (esp+*), routine always states that there is no overlap.

That is why tail-call applied, while he mustn't.

Comment 7 Yukhin Kirill 2011-06-30 15:22:58 UTC

Expanding arguments in different ways occurs because corresponding GIMPLE statements are of different types.
For 'good' case we have expression of type
  COMPONENT_REF

While for 'bad' one it is just a 
  MEM_REF

For that different kinds we have slightly different expanding.

The different expression types comes from front-end, at least in einline phase accesses are different:

  [include/boost/tuple/detail/tuple_basic.hpp : 130:14] D.167199_17 = MEM[(struct cons &)arg_list_2(D) + 12].head;
  [include/boost/tuple/detail/tuple_basic.hpp : 130:14] D.167198_18 = MEM[(struct cons &)arg_list_2(D) + 8].head;
  [include/boost/tuple/detail/tuple_basic.hpp : 130:14] D.167197_19 = MEM[(struct cons &)arg_list_2(D) + 4].head;
  [step-14.cc : 4271:1] D.167196_20 = MEM[(struct cons &)arg_list_2(D)];
  [include/base/thread_management.h : 1534:13] D.167205_21 = (sizetype) fun_ptr$__delta_7;
  [include/base/thread_management.h : 1534:13] D.167204_22 = obj_1(D) + D.167205_21;
  [include/base/thread_management.h : 1534:13] iftmp.53_23 (D.167204_22, D.167196_20, D.167197_19, D.167198_18, D.167199_17);
  [include/base/thread_management.h : 1826:5] return;

Having all that said I believe that the issue somewhat connected to fron-end generation.

Jason, could you prompt me something? 

Your patch changes a line which has a comment:
     /* Do array-to-pointer, function-to-pointer conversion, and ignore
        top-level qualifiers as required.  */
...

Comment 8 Yukhin Kirill 2011-06-30 15:26:36 UTC

If someone really need a quick fix, it may be done like this:
gcc/expor.s:

  static bool
  mem_overlaps_already_clobbered_arg_p (rtx addr, unsigned HOST_WIDE_INT size)
  {
    HOST_WIDE_INT i;

    if (addr == crtl->args.internal_arg_pointer)
      i = 0;
    else if (GET_CODE (addr) == PLUS
             && XEXP (addr, 0) == crtl->args.internal_arg_pointer
             && CONST_INT_P (XEXP (addr, 1)))
      i = INTVAL (XEXP (addr, 1));
    /* Return true for arg pointer based indexed addressing.  */
    else if (GET_CODE (addr) == PLUS
             && (XEXP (addr, 0) == crtl->args.internal_arg_pointer
                 || XEXP (addr, 1) == crtl->args.internal_arg_pointer))
      else if (GET_CODE(addr) == REG)   <-----------
        return true;
      else
        return false;

But we possibly will be to conservative doing so

Comment 9 Yukhin Kirill 2011-06-30 15:37:04 UTC

One more point for FE guys.
Function definition have no difference between 4 args. Here it is

include/base/thread_management.h:
        template <typename PFun, typename C, typename ArgList>
        static inline void do_call (PFun     fun_ptr,
                                    C       &obj,
                                    ArgList &arg_list,
                                    internal::return_value<RT> &ret_val,
                                    const int2type<4> &)
          {
            ret_val.set ((obj.*fun_ptr) (arg_list.template get<0>(),
                                         arg_list.template get<1>(),
                                         arg_list.template get<2>(),
                                         arg_list.template get<3>()));
          }

Comment 10 Jason Merrill 2011-07-05 14:45:40 UTC

*** Bug 49639 has been marked as a duplicate of this bug. ***

Comment 11 Jason Merrill 2011-07-05 18:10:24 UTC

(In reply to comment #7)
> Expanding arguments in different ways occurs because corresponding GIMPLE
> statements are of different types.
> For 'good' case we have expression of type
>   COMPONENT_REF
> 
> While for 'bad' one it is just a 
>   MEM_REF
> 
> For that different kinds we have slightly different expanding.
> 
> The different expression types comes from front-end, at least in einline phase
> accesses are different:

The front end doesn't create MEM_REFs, they are all produced in the middle end.  If tail call optimizations have trouble with them, that sounds like the bug.

I'm unable to reproduce your work; when I compile step-14.cc I never hit mem_overlaps_already_clobbered_arg_p while compiling do_call.  And reverting my change to tsubst_arg_types doesn't affect the use of MEM_REF in do_call.  What optimization options are you using?  I'm using -O -fipa-sra -foptimize-sibling-calls, as mentioned.

Comment 12 Richard Biener 2011-07-06 08:38:56 UTC

If it happens that the MEM_REF case (supposedly with the adjusted base pointer)
does not work while the non-MEM_REF case works then this looks like a
latent problem in the tailcall code to me.  I see nothing wrong with the
GIMPLE itself.  Eventually we can adjust code-gen for the MEM_REF case to
match that of the pure component-ref one, but that would again just paper
over the issue.

Comment 13 Yukhin Kirill 2011-07-06 08:47:20 UTC

I agree, that there is no problem with GIMPLE. As I mentioned we may just forbid tailcall opt for non-MEMREFS, but I suspect it will lead to significant perf. degradation. 
BTW, I am to extract simple testcase by now

Comment 14 Yukhin Kirill 2011-07-06 10:25:01 UTC

Created attachment 24700 [details]
Reduced testcase

Comment 15 Richard Biener 2011-07-06 10:30:26 UTC

The testcase reproduces the problem for me on trunk already with
-m32 -O -foptimize-sibling-calls, so it doesn't seem to be IPA SRA related.

Comment 16 Yukhin Kirill 2011-07-06 10:35:03 UTC

Yes.

This is because expander prepares arguments like this:
...
(insn 6 5 7 2 (parallel [
            (set (reg:SI 64)
                (plus:SI (reg/f:SI 53 virtual-incoming-args)
                    (const_int 4 [0x4])))
            (clobber (reg:CC 17 flags))
        ]) 2.cc:103 -1
     (nil))

(insn 7 6 8 2 (set (reg:SI 65)
        (mem/c:SI (plus:SI (reg:SI 64)
                (const_int 12 [0xc])) [0 MEM[(int &)&t + 12]+0 S4 A32])) 2.cc:103 -1
     (nil))

(insn 8 7 9 2 (set (mem:SI (plus:SI (reg/f:SI 53 virtual-incoming-args)
                (const_int 12 [0xc])) [0 S4 A32])
        (reg:SI 65)) 2.cc:103 -1
     (nil))
...

So, calls.c/mem_overlaps_already_clobbered_arg_p unable to determine, that data come from stack, so it returns true and enables tailcall.

Comment 17 Richard Biener 2011-07-06 10:37:57 UTC

I suspect that the tailcalling code does not expect TER to happen when
expanding the call arguments?  With -fno-tree-ter the issue also isn't appearant.

Comment 18 Richard Biener 2011-07-06 10:49:24 UTC

Hm, the argument setup for the tailcall clobbers the incoming stack argument.
And without TER we do not overlap argument setup with the loads from the
incoming stack argument.

Comment 19 Yukhin Kirill 2011-07-06 11:49:34 UTC

Created attachment 24701 [details]
Patch to make tailcall check more conservative

Attached patch adds another check for clobbered stack area.
If address comes from a register - we have no idea about destination address. That means we must act in conservative way - address possibly overlaps with stack area of interest.

Comment 20 Yukhin Kirill 2011-07-06 11:50:51 UTC

With patch attached both tescase and 447.dealII passing

Comment 21 rguenther@suse.de 2011-07-06 11:53:56 UTC

On Wed, 6 Jul 2011, kirill.yukhin at intel dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49519
> 
> --- Comment #19 from Yukhin Kirill <kirill.yukhin at intel dot com> 2011-07-06 11:49:34 UTC ---
> Created attachment 24701 [details]
>   --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24701
> Patch to make tailcall check more conservative
> 
> Attached patch adds another check for clobbered stack area.
> If address comes from a register - we have no idea about destination address.
> That means we must act in conservative way - address possibly overlaps with
> stack area of interest.

That looks reasonable.  Can you bootstrap & test this fix and post it to
gcc-patches?

Comment 22 Yukhin Kirill 2011-07-06 11:57:21 UTC

(In reply to comment #21)
> On Wed, 6 Jul 2011, kirill.yukhin at intel dot com wrote:
> 
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49519
> > 
> > --- Comment #19 from Yukhin Kirill <kirill.yukhin at intel dot com> 2011-07-06 11:49:34 UTC ---
> > Created attachment 24701 [details]
> >   --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24701
> > Patch to make tailcall check more conservative
> > 
> > Attached patch adds another check for clobbered stack area.
> > If address comes from a register - we have no idea about destination address.
> > That means we must act in conservative way - address possibly overlaps with
> > stack area of interest.
> 
> That looks reasonable.  Can you bootstrap & test this fix and post it to
> gcc-patches?

Already in progress :)

Comment 23 H.J. Lu 2011-07-06 19:37:39 UTC

A patch is posted at

http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00359.html

Comment 24 hjl@gcc.gnu.org 2011-07-08 13:12:05 UTC

Author: hjl
Date: Fri Jul  8 13:12:03 2011
New Revision: 176042

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=176042
Log:
Fix PR middle-end/49519.

gcc/

2011-07-08  Kirill Yukhin  <kirill.yukhin@intel.com>

	PR middle-end/49519
	* calls.c (mem_overlaps_already_clobbered_arg_p): Additional
	check if address is stored in register. If so - give up.
	(check_sibcall_argument_overlap_1): Do not perform check of
	overlapping when it is call to address.

gcc/tessuite/

2011-07-08  Kirill Yukhin  <kirill.yukhin@intel.com>

	PR middle-end/49519
	* g++.dg/torture/pr49519.C: New test.

Added:
    trunk/gcc/testsuite/g++.dg/torture/pr49519.C
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/calls.c
    trunk/gcc/testsuite/ChangeLog

Comment 25 Janis Johnson 2011-07-08 23:36:25 UTC

Test gcc.target/arm/sibcall-1.c starts failing with r176042 for arm-none-linux-gnueabi configured with defaults for arch and cpu.  The failure can be reproduced with a simple cross cc1.

Comment 26 Richard Biener 2011-07-11 11:54:01 UTC

Fixed.  For the missed tailcall opt please open a separate bugreport.

Comment 27 Ian Lance Taylor 2011-07-12 04:37:01 UTC

This patch seems too conservative and it appears that it will cause the compiler to miss all tailcalls with pointer arguments.  It only matters if the register points to the incoming parameters, which can only  happen in unusual cases.  We should be able to determine that reliably.  Janis, did you open a PR for the missing optimization?  I didn't see one.

Comment 28 Ian Lance Taylor 2011-07-12 04:54:14 UTC

*** Bug 49709 has been marked as a duplicate of this bug. ***

Comment 29 rguenther@suse.de 2011-07-12 08:35:18 UTC

On Tue, 12 Jul 2011, ian at airs dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49519
> 
> Ian Lance Taylor <ian at airs dot com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |ian at airs dot com
> 
> --- Comment #27 from Ian Lance Taylor <ian at airs dot com> 2011-07-12 04:37:01 UTC ---
> This patch seems too conservative and it appears that it will cause the
> compiler to miss all tailcalls with pointer arguments.  It only matters if the
> register points to the incoming parameters, which can only  happen in unusual
> cases.  We should be able to determine that reliably.  Janis, did you open a PR
> for the missing optimization?  I didn't see one.

The point is, of course, we have to determine it conservatively correct.
The patch makes sure we do that.

Richard.

Comment 30 Eric Botcazou 2011-07-12 08:42:40 UTC

> This patch seems too conservative and it appears that it will cause the
> compiler to miss all tailcalls with pointer arguments.  It only matters if the
> register points to the incoming parameters, which can only  happen in unusual
> cases.  We should be able to determine that reliably.

This is a bit of an overstatement though, as

extern int foo (int, int *);

int i;

int bar (int a, int b)
{
  int *p = &i;
  return foo (a, p);
}

will be caught.  Even this

extern int foo (int, int);

int i;

int bar (int a, int b)
{
  int *p = &i;
  return foo (a, *p);
}

will be caught thanks to alias analysis at the Tree level.

As for the "reliably", this time it's a little optimistic IMO. :-)  But we can probably be less conservative with more work.

Comment 31 Janis Johnson 2011-07-12 15:06:03 UTC

I filed PR middle-end/49719 for the sibcall-1.c failure.

Comment 32 H.J. Lu 2011-08-15 14:12:04 UTC

Can we check 2 cases:

1. Indirect function call via a register (PR 50074).
2. Non of function arguments are pointers (PR 49179).

Comment 33 Jorn Wolfgang Rennecke 2011-11-19 11:30:34 UTC

There is also the case where no argument has been stored on the stack yet.
That's quite common for ABIs that pass arguments in registers.