Bug 44194 - struct returned by value generates useless stores
Summary: struct returned by value generates useless stores
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.4.3
: P3 enhancement
Target Milestone: 4.8.0
Assignee: Eric Botcazou
URL:
Keywords: missed-optimization
: 49646 (view as bug list)
Depends on:
Blocks: 49429
  Show dependency treegraph
 
Reported: 2010-05-19 05:06 UTC by Josh Haberman
Modified: 2012-11-02 00:28 UTC (History)
9 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2010-05-19 09:38:49


Attachments
Patch to dse.c to be less conservative with calls. (2.05 KB, patch)
2011-04-12 22:39 UTC, Easwaran Raman
Details | Diff
Fixes a bug in the previous patch (2.10 KB, patch)
2011-04-14 18:59 UTC, Easwaran Raman
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Josh Haberman 2010-05-19 05:06:28 UTC
Test case:

--

#include <stdint.h>

struct twoints { uint64_t a, b; } foo();
void bar(uint64_t a, uint64_t b);

void func() {
  struct twoints s = foo();
  bar(s.a, s.b);
}

--

$ gcc -save-temps -Wall -c -o testbad.o -msse2 -O3 -fomit-frame-pointer testbad.c 
$ objdump -d -r -M intel testbad.o

testbad.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <func>:
   0:	48 83 ec 28          	sub    rsp,0x28
   4:	31 c0                	xor    eax,eax
   6:	e8 00 00 00 00       	call   b <func+0xb>
			7: R_X86_64_PC32	foo-0x4
   b:	48 89 04 24          	mov    QWORD PTR [rsp],rax
   f:	48 89 54 24 08       	mov    QWORD PTR [rsp+0x8],rdx
  14:	48 89 d6             	mov    rsi,rdx
  17:	48 89 44 24 10       	mov    QWORD PTR [rsp+0x10],rax
  1c:	48 89 54 24 18       	mov    QWORD PTR [rsp+0x18],rdx
  21:	48 89 c7             	mov    rdi,rax
  24:	48 83 c4 28          	add    rsp,0x28
  28:	e9 00 00 00 00       	jmp    2d <func+0x2d>
			29: R_X86_64_PC32	bar-0x4

--

As you can see above, rax and rdx are stored to the stack twice, but these stores are unnecessary.

$ gcc -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.4.3-4ubuntu5' --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 --program-suffix=-4.4 --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-plugin --enable-objc-gc --disable-werror --with-arch-32=i486 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)
Comment 1 Richard Biener 2010-05-19 09:38:49 UTC
Confirmed.

We already expand it that way:

;; s = foo ();

(insn 5 4 6 t.c:7 (set (reg:QI 0 ax)
        (const_int 0 [0x0])) -1 (nil))

(call_insn 6 5 7 t.c:7 (set (parallel:BLK [
                (expr_list:REG_DEP_TRUE (reg:DI 0 ax)
                    (const_int 0 [0x0]))
                (expr_list:REG_DEP_TRUE (reg:DI 1 dx)
                    (const_int 8 [0x8]))
            ])
        (call (mem:QI (symbol_ref:DI ("foo") [flags 0x41]  <function_decl 0x7ffff5aed500 foo>) [0 S1 A8])
            (const_int 0 [0x0]))) -1 (nil)
    (expr_list:REG_DEP_TRUE (use (reg:QI 0 ax))
        (nil)))

(insn 7 6 8 t.c:7 (set (reg:DI 60)
        (reg:DI 0 ax)) -1 (nil))

(insn 8 7 9 t.c:7 (set (reg:DI 61)
        (reg:DI 1 dx)) -1 (nil))

(insn 9 8 10 t.c:7 (set (mem/s/c:DI (plus:DI (reg/f:DI 54 virtual-stack-vars)
                (const_int -32 [0xffffffffffffffe0])) [2 S8 A128])
        (reg:DI 60)) -1 (nil))

(insn 10 9 11 t.c:7 (set (mem/s/c:DI (plus:DI (reg/f:DI 54 virtual-stack-vars)
                (const_int -24 [0xffffffffffffffe8])) [2 S8 A64])
        (reg:DI 61)) -1 (nil))

(insn 11 10 12 t.c:7 (set (reg:DI 62)
        (mem/s/c:DI (plus:DI (reg/f:DI 54 virtual-stack-vars)
                (const_int -32 [0xffffffffffffffe0])) [2 S8 A128])) -1 (nil))

(insn 12 11 13 t.c:7 (set (mem/s/c:DI (plus:DI (reg/f:DI 54 virtual-stack-vars)
                (const_int -16 [0xfffffffffffffff0])) [2 s+0 S8 A128])
        (reg:DI 62)) -1 (nil))

(insn 13 12 14 t.c:7 (set (reg:DI 63)
        (mem/s/c:DI (plus:DI (reg/f:DI 54 virtual-stack-vars)
                (const_int -24 [0xffffffffffffffe8])) [2 S8 A64])) -1 (nil))

(insn 14 13 0 t.c:7 (set (mem/s/c:DI (plus:DI (reg/f:DI 54 virtual-stack-vars)
                (const_int -8 [0xfffffffffffffff8])) [2 s+8 S8 A64])
        (reg:DI 63)) -1 (nil))


So we create a stack representation to copy it to the stack temporary
(which both are unneeded).  We should see that we can avoid the
temporary at all as there is no aggregate use of the struct left.
At least we should avoid the 2nd temporary.

I'm very sure there is a duplicate for this bug somewhere.

Also I wonder why RTL DSE cannot remove all the stores to the stack.
Comment 2 Jakub Jelinek 2010-05-19 10:13:41 UTC
RTL DSE doesn't handle this because the call to bar, which isn't a const function, is considered a wild read and thus makes all stores necessary in the global as well as local algorithm.
RTL DSE doesn't consider whether a frame based address could have possibly address taken or not and whether a call thus might read it or not.

For tail calls before reload, perhaps we could handle tail calls similarly to const calls (be a read of all argument stores) with the addition that it would act as a read for all constant address stores (basically wild read for anything
but frame based stores for the global algorithm, given that the stack is unwound before the tail call).
Comment 3 Richard Biener 2010-05-19 10:22:07 UTC
At leas the stores to s have alias info:

(insn 12 10 14 2 t.c:7 (set (mem/s/c:DI (plus:DI (reg/f:DI 20 frame)
                (const_int -16 [0xfffffffffffffff0])) [2 s+0 S8 A128])
        (reg:DI 60)) 89 {*movdi_1_rex64} (nil))

(insn 14 12 17 2 t.c:7 (set (mem/s/c:DI (plus:DI (reg/f:DI 20 frame)
                (const_int -8 [0xfffffffffffffff8])) [2 s+8 S8 A64])
        (reg:DI 61)) 89 {*movdi_1_rex64} (nil))

so RTL DSE could check whether the stack slot is aliased at all.

The other memory temporary should be avoided at expansion time already.
Comment 4 Josh Haberman 2010-07-10 01:38:30 UTC
This seems to happen even with POD return types:

int foo();
void bar(int a);

void func() {
  bar(foo());
}

In 32-bit mode it spills the return value to the stack for no reason.  It also seems to overallocate the stack (28 bytes allocated, only 4 used):

00000000 <func>:
   0:	83 ec 1c             	sub    esp,0x1c
   3:	e8 fc ff ff ff       	call   4 <func+0x4>
			4: R_386_PC32	foo
   8:	89 04 24             	mov    DWORD PTR [esp],eax
   b:	e8 fc ff ff ff       	call   c <func+0xc>
			c: R_386_PC32	bar
  10:	83 c4 1c             	add    esp,0x1c
  13:	c3                   	ret    

In 64-bit mode there is no store, but it *does* allocate 8 bytes of stack that it never uses:

0000000000000000 <func>:
   0:	48 83 ec 08          	sub    rsp,0x8
   4:	31 c0                	xor    eax,eax
   6:	e8 00 00 00 00       	call   b <func+0xb>
			7: R_X86_64_PC32	foo+0xfffffffffffffffc
   b:	48 83 c4 08          	add    rsp,0x8
   f:	89 c7                	mov    edi,eax
  11:	e9 00 00 00 00       	jmp    16 <func+0x16>
			12: R_X86_64_PC32	bar+0xfffffffffffffffc

Any idea how hard this bug is to fix?
Comment 5 Andrew Pinski 2010-07-10 01:40:33 UTC
>In 32-bit mode it spills the return value to the stack for no reason. 

Huh?  arguments are passed via the stack in 32bit mode.
Comment 6 Andrew Pinski 2010-07-10 01:42:22 UTC
>In 64-bit mode there is no store, but it *does* allocate 8 bytes of stack that
it never uses:

Oh no that is called aligning the stack to be 16 byte aligned.

>It also
seems to overallocate the stack (28 bytes allocated, only 4 used):

No, it is not over allocating the stack really, it does align it be to 16 byte aligned though. 
Comment 7 Josh Haberman 2010-07-10 01:48:25 UTC
I must have been on crack when I wrote that last comment.  Sorry for the noise.

Though I do wonder how difficult the original bug is to fix.  This seems to make it more expensive to return structures by value.
Comment 8 Josh Haberman 2011-02-24 03:27:04 UTC
I found another test case for this.  I thought I'd post it since it's extremely different than the original one.

--

class Foo {
 public:
  virtual ~Foo() {}
  virtual void DoSomething() = 0;
};

void foo(Foo *f, void (Foo::*member)()) {
  (f->*member)();
}

--

$ g++ -c -O3 -fomit-frame-pointer test.cc
$ objdump -M intel -d test.o

test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <_Z3fooP3FooMS_FvvE>:
   0:	40 f6 c6 01          	test   sil,0x1
   4:	48 89 74 24 e8       	mov    QWORD PTR [rsp-0x18],rsi
   9:	48 89 54 24 f0       	mov    QWORD PTR [rsp-0x10],rdx
   e:	74 10                	je     20 <_Z3fooP3FooMS_FvvE+0x20>
  10:	48 01 d7             	add    rdi,rdx
  13:	48 8b 07             	mov    rax,QWORD PTR [rdi]
  16:	48 8b 74 30 ff       	mov    rsi,QWORD PTR [rax+rsi*1-0x1]
  1b:	ff e6                	jmp    rsi
  1d:	0f 1f 00             	nop    DWORD PTR [rax]
  20:	48 01 d7             	add    rdi,rdx
  23:	ff e6                	jmp    rsi

--

We spilled rsi and rdx to the stack (in the red zone, it appears) for no reason (AFAICS).
Comment 9 Easwaran Raman 2011-04-12 22:39:23 UTC
Created attachment 23968 [details]
Patch to dse.c to be less conservative with calls.

Currently dse kills all stores on a call since call can do a wild read. But calls can not read off frame unless it is a local variable that can escape. This patch ensures frame based stores are not killed on a call if they can't escape. For the first struct return case, this removes the redundant stores to the local variable. Does this look reasonable?
Comment 10 Easwaran Raman 2011-04-14 18:59:26 UTC
Created attachment 23987 [details]
Fixes a bug in the previous patch
Comment 11 Eric Botcazou 2011-04-14 22:22:35 UTC
Richard, did you have something like that in mind when writing comment #3?  What aliasing info do we have at the RTL level now?
Comment 12 Richard Biener 2011-04-15 09:10:07 UTC
Yes, something like that.  Though can_escape can be made simpler and more
precise by

bool
can_escape (tree expr)
{
  tree base;
  if (!expr)
    return true;
  base = get_base_address (expr);
  if (DECL_P (base)
      && (!may_be_aliased (base)
          || !pt_solution_includes (&cfun->gimple_df.escaped, base)))
    return true;
  return false;
}
Comment 13 Easwaran Raman 2011-04-15 19:18:25 UTC
Richard, did you mean to write

static bool
can_escape (tree expr)
{
  tree base;
  if (!expr)
    return true;
  base = get_base_address (expr);
  if (DECL_P (base)
      && (!may_be_aliased (base)
          && !pt_solution_includes (&cfun->gimple_df->escaped, base)))
    return false;
  return true;
}

Only case when we know it doesn't escape is if bas is a DECL_P and is not in cfun->gimple_df->escaped and not aliased, right? Actually, I'm wondering if it is sufficient to test just
DECL_P (base) && !pt_solution_includes (&cfun->gimple_df->escaped, base).
Comment 14 rguenther@suse.de 2011-04-15 19:31:36 UTC
On Fri, 15 Apr 2011, eraman at google dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194
> 
> Easwaran Raman <eraman at google dot com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |eraman at google dot com
> 
> --- Comment #13 from Easwaran Raman <eraman at google dot com> 2011-04-15 19:18:25 UTC ---
> Richard, did you mean to write
> 
> static bool
> can_escape (tree expr)
> {
>   tree base;
>   if (!expr)
>     return true;
>   base = get_base_address (expr);
>   if (DECL_P (base)
>       && (!may_be_aliased (base)
>           && !pt_solution_includes (&cfun->gimple_df->escaped, base)))
>     return false;
>   return true;
> }
> 
> Only case when we know it doesn't escape is if bas is a DECL_P and is not in
> cfun->gimple_df->escaped and not aliased, right? Actually, I'm wondering if it
> is sufficient to test just
> DECL_P (base) && !pt_solution_includes (&cfun->gimple_df->escaped, base).

No, because if the escaped solution for example includes ANYTHING then
the test will return true.  That !may-aliased variables are not
contained in ANYTHING isn't known w/o context.

Richard.
Comment 15 Easwaran Raman 2011-04-15 22:22:15 UTC
(In reply to comment #14)
> On Fri, 15 Apr 2011, eraman at google dot com wrote:
> 
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194
> > 
> > Easwaran Raman <eraman at google dot com> changed:
> > 
> >            What    |Removed                     |Added
> > ----------------------------------------------------------------------------
> >                  CC|                            |eraman at google dot com
> > 
> > --- Comment #13 from Easwaran Raman <eraman at google dot com> 2011-04-15 19:18:25 UTC ---
> > Richard, did you mean to write
> > 
> > static bool
> > can_escape (tree expr)
> > {
> >   tree base;
> >   if (!expr)
> >     return true;
> >   base = get_base_address (expr);
> >   if (DECL_P (base)
> >       && (!may_be_aliased (base)
> >           && !pt_solution_includes (&cfun->gimple_df->escaped, base)))
> >     return false;
> >   return true;
> > }
> > 
> > Only case when we know it doesn't escape is if bas is a DECL_P and is not in
> > cfun->gimple_df->escaped and not aliased, right? Actually, I'm wondering if it
> > is sufficient to test just
> > DECL_P (base) && !pt_solution_includes (&cfun->gimple_df->escaped, base).
> 
> No, because if the escaped solution for example includes ANYTHING then
> the test will return true.  That !may-aliased variables are not
> contained in ANYTHING isn't known w/o context.
> 
> Richard.

Correct me if I am wrong. If I understand you right, just using DECL_P (base) && !pt_solution_includes is conservative since pt_solution_includes may return true if the escaped solution contains ANYTHING. To make it less conservative, you're suggesting

  if (DECL_P (base)
      && (!may_be_aliased (base)
          || !pt_solution_includes (&cfun->gimple_df->escaped, base)))
    return false;

 I tried that and most Fortran tests are failing. One of the tests (default_format_1.f90) has the following RTL sequence:


(insn 30 29 32 4 (set (mem/s/c:SI (plus:DI (reg/f:DI 20 frame)
                (const_int -608 [0xfffffffffffffda0])) [2 dt_parm.0.common.flags+0 S4 A64])
        (const_int 16512 [0x4080])) default_format_1.inc:56 64 {*movsi_internal}
     (nil))

(insn 32 30 33 4 (set (reg:DI 5 di)
        (reg/f:DI 106)) default_format_1.inc:56 62 {*movdi_internal_rex64}
     (expr_list:REG_EQUAL (plus:DI (reg/f:DI 20 frame)
            (const_int -608 [0xfffffffffffffda0]))
        (nil)))

(call_insn 33 32 36 4 (call (mem:QI (symbol_ref:DI ("_gfortran_st_write") [flags 0x41]  <function_decl 0x7f301ed12e00 _gfortran_st_write>) [0 _gfortran_st_write S1 A8])
        (const_int 0 [0])) default_format_1.inc:56 618 {*call_0}
     (expr_list:REG_DEAD (reg:DI 5 di)
        (nil))
    (expr_list:REG_DEP_TRUE (use (reg:DI 5 di))
        (nil)))

For the DECL dt_parm, pt_solution_includes (&cfun->gimple_df->escaped, base) returns false, even though its location is passed as a parameter to _gfortran_st_write.

I did test  with 
if (DECL_P (base)
      && (!may_be_aliased (base)
          && !pt_solution_includes (&cfun->gimple_df->escaped, base)))

which has no regressions. Is that what you suggest?
Comment 16 rguenther@suse.de 2011-04-17 10:44:02 UTC
On Fri, 15 Apr 2011, eraman at google dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194
> 
> --- Comment #15 from Easwaran Raman <eraman at google dot com> 2011-04-15 22:22:15 UTC ---
> (In reply to comment #14)
> > On Fri, 15 Apr 2011, eraman at google dot com wrote:
> > 
> > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194
> > > 
> > > Easwaran Raman <eraman at google dot com> changed:
> > > 
> > >            What    |Removed                     |Added
> > > ----------------------------------------------------------------------------
> > >                  CC|                            |eraman at google dot com
> > > 
> > > --- Comment #13 from Easwaran Raman <eraman at google dot com> 2011-04-15 19:18:25 UTC ---
> > > Richard, did you mean to write
> > > 
> > > static bool
> > > can_escape (tree expr)
> > > {
> > >   tree base;
> > >   if (!expr)
> > >     return true;
> > >   base = get_base_address (expr);
> > >   if (DECL_P (base)
> > >       && (!may_be_aliased (base)
> > >           && !pt_solution_includes (&cfun->gimple_df->escaped, base)))
> > >     return false;
> > >   return true;
> > > }
> > > 
> > > Only case when we know it doesn't escape is if bas is a DECL_P and is not in
> > > cfun->gimple_df->escaped and not aliased, right? Actually, I'm wondering if it
> > > is sufficient to test just
> > > DECL_P (base) && !pt_solution_includes (&cfun->gimple_df->escaped, base).
> > 
> > No, because if the escaped solution for example includes ANYTHING then
> > the test will return true.  That !may-aliased variables are not
> > contained in ANYTHING isn't known w/o context.
> > 
> > Richard.
> 
> Correct me if I am wrong. If I understand you right, just using DECL_P (base)
> && !pt_solution_includes is conservative since pt_solution_includes may return
> true if the escaped solution contains ANYTHING. To make it less conservative,
> you're suggesting
> 
>   if (DECL_P (base)
>       && (!may_be_aliased (base)
>           || !pt_solution_includes (&cfun->gimple_df->escaped, base)))
>     return false;
> 
>  I tried that and most Fortran tests are failing. One of the tests
> (default_format_1.f90) has the following RTL sequence:
> 
> 
> (insn 30 29 32 4 (set (mem/s/c:SI (plus:DI (reg/f:DI 20 frame)
>                 (const_int -608 [0xfffffffffffffda0])) [2
> dt_parm.0.common.flags+0 S4 A64])
>         (const_int 16512 [0x4080])) default_format_1.inc:56 64
> {*movsi_internal}
>      (nil))
> 
> (insn 32 30 33 4 (set (reg:DI 5 di)
>         (reg/f:DI 106)) default_format_1.inc:56 62 {*movdi_internal_rex64}
>      (expr_list:REG_EQUAL (plus:DI (reg/f:DI 20 frame)
>             (const_int -608 [0xfffffffffffffda0]))
>         (nil)))
> 
> (call_insn 33 32 36 4 (call (mem:QI (symbol_ref:DI ("_gfortran_st_write")
> [flags 0x41]  <function_decl 0x7f301ed12e00 _gfortran_st_write>) [0
> _gfortran_st_write S1 A8])
>         (const_int 0 [0])) default_format_1.inc:56 618 {*call_0}
>      (expr_list:REG_DEAD (reg:DI 5 di)
>         (nil))
>     (expr_list:REG_DEP_TRUE (use (reg:DI 5 di))
>         (nil)))
> 
> For the DECL dt_parm, pt_solution_includes (&cfun->gimple_df->escaped, base)
> returns false, even though its location is passed as a parameter to
> _gfortran_st_write.
> 
> I did test  with 
> if (DECL_P (base)
>       && (!may_be_aliased (base)
>           && !pt_solution_includes (&cfun->gimple_df->escaped, base)))
> 
> which has no regressions. Is that what you suggest?

No, the version with || should be ok.  The dt_parm argument does
not escape at the _gfortran_st_write call site because this
intrinsic function has a ".wW" fnspec attribute which specifies
the arguments do not escape.  What you indeed need to do in
addition to the escaped solution query is walk over all function
arguments and see if there is one that aliases 'base'.  That
may not be easily possible on RTL though.  On the tree level
we have a separate points-to set for such call clobbers/uses
but we do not preserve it for RTL.
Comment 17 Easwaran Raman 2011-04-21 00:20:51 UTC
On Sun, Apr 17, 2011 at 3:45 AM, rguenther at suse dot de
<gcc-bugzilla@gcc.gnu.org> wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194
>
> --- Comment #16 from rguenther at suse dot de <rguenther at suse dot de> 2011-04-17 10:44:02 UTC ---
> On Fri, 15 Apr 2011, eraman at google dot com wrote:
>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194
>>
>> --- Comment #15 from Easwaran Raman <eraman at google dot com> 2011-04-15 22:22:15 UTC ---
>> (In reply to comment #14)
>> > On Fri, 15 Apr 2011, eraman at google dot com wrote:
>> >
>> > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194
>> > >
>> > > Easwaran Raman <eraman at google dot com> changed:
>> > >
>> > >            What    |Removed                     |Added
>> > > ----------------------------------------------------------------------------
>> > >                  CC|                            |eraman at google dot com
>> > >
>> > > --- Comment #13 from Easwaran Raman <eraman at google dot com> 2011-04-15 19:18:25 UTC ---
>> > > Richard, did you mean to write
>> > >
>> > > static bool
>> > > can_escape (tree expr)
>> > > {
>> > >   tree base;
>> > >   if (!expr)
>> > >     return true;
>> > >   base = get_base_address (expr);
>> > >   if (DECL_P (base)
>> > >       && (!may_be_aliased (base)
>> > >           && !pt_solution_includes (&cfun->gimple_df->escaped, base)))
>> > >     return false;
>> > >   return true;
>> > > }
>> > >
>> > > Only case when we know it doesn't escape is if bas is a DECL_P and is not in
>> > > cfun->gimple_df->escaped and not aliased, right? Actually, I'm wondering if it
>> > > is sufficient to test just
>> > > DECL_P (base) && !pt_solution_includes (&cfun->gimple_df->escaped, base).
>> >
>> > No, because if the escaped solution for example includes ANYTHING then
>> > the test will return true.  That !may-aliased variables are not
>> > contained in ANYTHING isn't known w/o context.
>> >
>> > Richard.
>>
>> Correct me if I am wrong. If I understand you right, just using DECL_P (base)
>> && !pt_solution_includes is conservative since pt_solution_includes may return
>> true if the escaped solution contains ANYTHING. To make it less conservative,
>> you're suggesting
>>
>>   if (DECL_P (base)
>>       && (!may_be_aliased (base)
>>           || !pt_solution_includes (&cfun->gimple_df->escaped, base)))
>>     return false;
>>
>>  I tried that and most Fortran tests are failing. One of the tests
>> (default_format_1.f90) has the following RTL sequence:
>>
>>
>> (insn 30 29 32 4 (set (mem/s/c:SI (plus:DI (reg/f:DI 20 frame)
>>                 (const_int -608 [0xfffffffffffffda0])) [2
>> dt_parm.0.common.flags+0 S4 A64])
>>         (const_int 16512 [0x4080])) default_format_1.inc:56 64
>> {*movsi_internal}
>>      (nil))
>>
>> (insn 32 30 33 4 (set (reg:DI 5 di)
>>         (reg/f:DI 106)) default_format_1.inc:56 62 {*movdi_internal_rex64}
>>      (expr_list:REG_EQUAL (plus:DI (reg/f:DI 20 frame)
>>             (const_int -608 [0xfffffffffffffda0]))
>>         (nil)))
>>
>> (call_insn 33 32 36 4 (call (mem:QI (symbol_ref:DI ("_gfortran_st_write")
>> [flags 0x41]  <function_decl 0x7f301ed12e00 _gfortran_st_write>) [0
>> _gfortran_st_write S1 A8])
>>         (const_int 0 [0])) default_format_1.inc:56 618 {*call_0}
>>      (expr_list:REG_DEAD (reg:DI 5 di)
>>         (nil))
>>     (expr_list:REG_DEP_TRUE (use (reg:DI 5 di))
>>         (nil)))
>>
>> For the DECL dt_parm, pt_solution_includes (&cfun->gimple_df->escaped, base)
>> returns false, even though its location is passed as a parameter to
>> _gfortran_st_write.
>>
>> I did test  with
>> if (DECL_P (base)
>>       && (!may_be_aliased (base)
>>           && !pt_solution_includes (&cfun->gimple_df->escaped, base)))
>>
>> which has no regressions. Is that what you suggest?
>
> No, the version with || should be ok.  The dt_parm argument does
> not escape at the _gfortran_st_write call site because this
> intrinsic function has a ".wW" fnspec attribute which specifies
> the arguments do not escape.  What you indeed need to do in
> addition to the escaped solution query is walk over all function
> arguments and see if there is one that aliases 'base'.  That
> may not be easily possible on RTL though.  On the tree level
> we have a separate points-to set for such call clobbers/uses
> but we do not preserve it for RTL.

Is it ok to make calls whose arg(s) have EAF_NOESCAPE kill all
locations off the frame in addition to killing all locations that
potentially escape (using the || case you suggested)? Will it be
better or worse than just checking !may_be_aliased (base) alone?

Thanks,
Easwaran


> --
> Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug.
>
Comment 18 rguenther@suse.de 2011-04-21 08:36:14 UTC
On Thu, 21 Apr 2011, eraman at google dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194
> 
> --- Comment #17 from Easwaran Raman <eraman at google dot com> 2011-04-21 00:20:51 UTC ---
> On Sun, Apr 17, 2011 at 3:45 AM, rguenther at suse dot de
> <gcc-bugzilla@gcc.gnu.org> wrote:
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194
> >
> > --- Comment #16 from rguenther at suse dot de <rguenther at suse dot de> 2011-04-17 10:44:02 UTC ---
> > On Fri, 15 Apr 2011, eraman at google dot com wrote:
> >
> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194
> >>
> >> --- Comment #15 from Easwaran Raman <eraman at google dot com> 2011-04-15 22:22:15 UTC ---
> >> (In reply to comment #14)
> >> > On Fri, 15 Apr 2011, eraman at google dot com wrote:
> >> >
> >> > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194
> >> > >
> >> > > Easwaran Raman <eraman at google dot com> changed:
> >> > >
> >> > >            What    |Removed                     |Added
> >> > > ----------------------------------------------------------------------------
> >> > >                  CC|                            |eraman at google dot com
> >> > >
> >> > > --- Comment #13 from Easwaran Raman <eraman at google dot com> 2011-04-15 19:18:25 UTC ---
> >> > > Richard, did you mean to write
> >> > >
> >> > > static bool
> >> > > can_escape (tree expr)
> >> > > {
> >> > >   tree base;
> >> > >   if (!expr)
> >> > >     return true;
> >> > >   base = get_base_address (expr);
> >> > >   if (DECL_P (base)
> >> > >       && (!may_be_aliased (base)
> >> > >           && !pt_solution_includes (&cfun->gimple_df->escaped, base)))
> >> > >     return false;
> >> > >   return true;
> >> > > }
> >> > >
> >> > > Only case when we know it doesn't escape is if bas is a DECL_P and is not in
> >> > > cfun->gimple_df->escaped and not aliased, right? Actually, I'm wondering if it
> >> > > is sufficient to test just
> >> > > DECL_P (base) && !pt_solution_includes (&cfun->gimple_df->escaped, base).
> >> >
> >> > No, because if the escaped solution for example includes ANYTHING then
> >> > the test will return true.  That !may-aliased variables are not
> >> > contained in ANYTHING isn't known w/o context.
> >> >
> >> > Richard.
> >>
> >> Correct me if I am wrong. If I understand you right, just using DECL_P (base)
> >> && !pt_solution_includes is conservative since pt_solution_includes may return
> >> true if the escaped solution contains ANYTHING. To make it less conservative,
> >> you're suggesting
> >>
> >>   if (DECL_P (base)
> >>       && (!may_be_aliased (base)
> >>           || !pt_solution_includes (&cfun->gimple_df->escaped, base)))
> >>     return false;
> >>
> >>  I tried that and most Fortran tests are failing. One of the tests
> >> (default_format_1.f90) has the following RTL sequence:
> >>
> >>
> >> (insn 30 29 32 4 (set (mem/s/c:SI (plus:DI (reg/f:DI 20 frame)
> >>                 (const_int -608 [0xfffffffffffffda0])) [2
> >> dt_parm.0.common.flags+0 S4 A64])
> >>         (const_int 16512 [0x4080])) default_format_1.inc:56 64
> >> {*movsi_internal}
> >>      (nil))
> >>
> >> (insn 32 30 33 4 (set (reg:DI 5 di)
> >>         (reg/f:DI 106)) default_format_1.inc:56 62 {*movdi_internal_rex64}
> >>      (expr_list:REG_EQUAL (plus:DI (reg/f:DI 20 frame)
> >>             (const_int -608 [0xfffffffffffffda0]))
> >>         (nil)))
> >>
> >> (call_insn 33 32 36 4 (call (mem:QI (symbol_ref:DI ("_gfortran_st_write")
> >> [flags 0x41]  <function_decl 0x7f301ed12e00 _gfortran_st_write>) [0
> >> _gfortran_st_write S1 A8])
> >>         (const_int 0 [0])) default_format_1.inc:56 618 {*call_0}
> >>      (expr_list:REG_DEAD (reg:DI 5 di)
> >>         (nil))
> >>     (expr_list:REG_DEP_TRUE (use (reg:DI 5 di))
> >>         (nil)))
> >>
> >> For the DECL dt_parm, pt_solution_includes (&cfun->gimple_df->escaped, base)
> >> returns false, even though its location is passed as a parameter to
> >> _gfortran_st_write.
> >>
> >> I did test  with
> >> if (DECL_P (base)
> >>       && (!may_be_aliased (base)
> >>           && !pt_solution_includes (&cfun->gimple_df->escaped, base)))
> >>
> >> which has no regressions. Is that what you suggest?
> >
> > No, the version with || should be ok.  The dt_parm argument does
> > not escape at the _gfortran_st_write call site because this
> > intrinsic function has a ".wW" fnspec attribute which specifies
> > the arguments do not escape.  What you indeed need to do in
> > addition to the escaped solution query is walk over all function
> > arguments and see if there is one that aliases 'base'.  That
> > may not be easily possible on RTL though.  On the tree level
> > we have a separate points-to set for such call clobbers/uses
> > but we do not preserve it for RTL.
> 
> Is it ok to make calls whose arg(s) have EAF_NOESCAPE kill all
> locations off the frame in addition to killing all locations that
> potentially escape (using the || case you suggested)? Will it be
> better or worse than just checking !may_be_aliased (base) alone?

It's probably easiest and safest to just check !may_be_aliased for now.

Richard.
Comment 19 eraman 2011-06-14 22:58:24 UTC
Author: eraman
Date: Tue Jun 14 22:58:20 2011
New Revision: 175063

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=175063
Log:
2011-06-14  Easwaran Raman  <eraman@google.com>

       PR rtl-optimization/44194
       * dse.c: Include tree-flow.h
       (insn_info): Add new field non_frame_wild_read.
       (group_info): Add new fields escaped_n and escaped_p.
       (kill_on_calls): New variable.
       (get_group_info): Initialize gi->escaped_n and gi->escaped_p.
       (dse_step0): Initialize kill_on_calls.
       (can_escape): New function.
       (set_usage_bits): Add additional parameter; record information
       about escaped locations.
       (record_store): Pass EXPR corresponding to MEM to
       set_usage_bits.
       (dse_step2_nospill): Set kill_on_calls based on
       group->escaped_n and group->escaped_n.
       (add_wild_read): Refactor into...
       (reset_active_stores): ... New method, and
       (free_read_records): ... New method.
       (add_non_frame_wild_read): New method.
       (scan_insn): Call add_non_frame_wild_read on non-const calls.
       (scan_reads_nospill): Handle instructions with
       non_frame_wild_read.
       (dse_step5_nospill): Call scan_reads_nospill for instructions
       marked as non_frame_wild_read.
       (dse_step7): Free escaped_n, escaped_p and kill_on_calls
       bitmaps.

testsuite/ChangeLog

2011-06-14  Easwaran Raman  <eraman@google.com>

       PR rtl-optimization/44194
       * gcc.dg/pr44194-1.c: New test.
       * gcc.dg/pr44194-2.c: New test.


Added:
    trunk/gcc/testsuite/gcc.dg/pr44194-1.c
    trunk/gcc/testsuite/gcc.dg/pr44194-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/dse.c
    trunk/gcc/testsuite/ChangeLog
Comment 20 Eric Botcazou 2011-06-15 16:48:45 UTC
Fixed on trunk.  Thanks for tackling this, Easwaran.
Comment 21 Easwaran Raman 2011-06-15 20:34:32 UTC
The DSE patch still leaves 2 redundant stores. The following patch will enable DSE to remove those two stores. Does this look ok?



Index: gcc/testsuite/gcc.dg/pr44194-1.c
===================================================================
--- gcc/testsuite/gcc.dg/pr44194-1.c	(revision 175082)
+++ gcc/testsuite/gcc.dg/pr44194-1.c	(working copy)
@@ -13,5 +13,5 @@ void func() {
   struct ints s = foo();
   bar(s.a, s.b);
 }
-/* { dg-final { scan-rtl-dump "global deletions = 2"  "dse1" } } */
+/* { dg-final { scan-rtl-dump "global deletions = 4"  "dse1" } } */
 /* { dg-final { cleanup-rtl-dump "dse1" } } */
Index: gcc/calls.c
===================================================================
--- gcc/calls.c	(revision 175081)
+++ gcc/calls.c	(working copy)
@@ -3005,8 +3005,9 @@ expand_call (tree exp, rtx target, int ignore)
 	      tree nt = build_qualified_type (rettype,
 					      (TYPE_QUALS (rettype)
 					       | TYPE_QUAL_CONST));
-
+              tree target_expr = create_tmp_reg (rettype, NULL);
 	      target = assign_temp (nt, 0, 1, 1);
+              set_mem_expr (target, target_expr);
 	    }
 
 	  if (! rtx_equal_p (target, valreg))
Comment 22 Eric Botcazou 2011-06-15 21:09:18 UTC
> The DSE patch still leaves 2 redundant stores.

OK, I missed this, reopening...

> The following patch will enable DSE to remove those two stores. Does this
> look ok?

Calling into the gimplifier from the RTL expander doesn't look appropriate.

More fundamentally, it's a little unfortunate to spill to memory a value returned in registers.  Can we try to use emit_group_move_into_temps here instead (under the appropriate circumstances)?
Comment 23 davidxl 2011-06-15 23:14:50 UTC
(In reply to comment #22)
> > The DSE patch still leaves 2 redundant stores.
> 
> OK, I missed this, reopening...
> 
> > The following patch will enable DSE to remove those two stores. Does this
> > look ok?
> 
> Calling into the gimplifier from the RTL expander doesn't look appropriate.
> 
> More fundamentally, it's a little unfortunate to spill to memory a value
> returned in registers.  Can we try to use emit_group_move_into_temps here
> instead (under the appropriate circumstances)?

It would be nice if the expander does not spill the return into memory in the first place if possible.  On other hand tagging compiler created memory location with temp decls so that aliaser has the symbolic information seems a useful mechanism. 

Easwaran, can you post the patch to gcc-patches for more comments?

Thanks,

David
Comment 24 Eric Botcazou 2011-06-16 05:26:56 UTC
> It would be nice if the expander does not spill the return into memory in the
> first place if possible.  On other hand tagging compiler created memory
> location with temp decls so that aliaser has the symbolic information seems a
> useful mechanism. 

But a poor kludge if you can avoid spilling to memory in the first place.
Comment 25 davidxl 2011-06-16 07:41:57 UTC
On Wed, Jun 15, 2011 at 10:26 PM, ebotcazou at gcc dot gnu.org <
gcc-bugzilla@gcc.gnu.org> wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194
>
> --- Comment #24 from Eric Botcazou <ebotcazou at gcc dot gnu.org>
> 2011-06-16 05:26:56 UTC ---
> > It would be nice if the expander does not spill the return into memory in
> the
> > first place if possible.  On other hand tagging compiler created memory
> > location with temp decls so that aliaser has the symbolic information
> seems a
> > useful mechanism.
>
> But a poor kludge if you can avoid spilling to memory in the first place.
>

No I am talking about more general situation where temporary memory is
created in RTL pass.

David


>
> --
> Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug.
>
Comment 26 rguenther@suse.de 2011-06-16 08:36:21 UTC
On Wed, 15 Jun 2011, xinliangli at gmail dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194
> 
> davidxl <xinliangli at gmail dot com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |xinliangli at gmail dot com
> 
> --- Comment #23 from davidxl <xinliangli at gmail dot com> 2011-06-15 23:14:50 UTC ---
> (In reply to comment #22)
> > > The DSE patch still leaves 2 redundant stores.
> > 
> > OK, I missed this, reopening...
> > 
> > > The following patch will enable DSE to remove those two stores. Does this
> > > look ok?
> > 
> > Calling into the gimplifier from the RTL expander doesn't look appropriate.

It also should use create_tmp_var, not create_tmp_reg.  But I wonder why
memory allocated via assign_temp isn't marked in a way to let dse
do its job (I guess dse thinks that memory escapes?).

> > More fundamentally, it's a little unfortunate to spill to memory a value
> > returned in registers.  Can we try to use emit_group_move_into_temps here
> > instead (under the appropriate circumstances)?
> 
> It would be nice if the expander does not spill the return into memory in the
> first place if possible.  On other hand tagging compiler created memory
> location with temp decls so that aliaser has the symbolic information seems a
> useful mechanism. 

Sure - but I wonder why assign_temp doesn't do something equivalent
that doesn't require a automatic VAR_DECL to be created.

Where does the aliaser catch things with the VAR_DECL around that
it doesn't without it?

Richard.
Comment 27 Easwaran Raman 2011-06-16 17:14:38 UTC
(In reply to comment #26)
> On Wed, 15 Jun 2011, xinliangli at gmail dot com wrote:
> 
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194
> > 
> > davidxl <xinliangli at gmail dot com> changed:
> > 
> >            What    |Removed                     |Added
> > ----------------------------------------------------------------------------
> >                  CC|                            |xinliangli at gmail dot com
> > 
> > --- Comment #23 from davidxl <xinliangli at gmail dot com> 2011-06-15 23:14:50 UTC ---
> > (In reply to comment #22)
> > > > The DSE patch still leaves 2 redundant stores.
> > > 
> > > OK, I missed this, reopening...
> > > 
> > > > The following patch will enable DSE to remove those two stores. Does this
> > > > look ok?
> > > 
> > > Calling into the gimplifier from the RTL expander doesn't look appropriate.
> 
> It also should use create_tmp_var, not create_tmp_reg.  But I wonder why
> memory allocated via assign_temp isn't marked in a way to let dse
> do its job (I guess dse thinks that memory escapes?).
If the mem rtx doesn't have a tree_expression associated with it, DSE assumes the memory escapes.

> 
> > > More fundamentally, it's a little unfortunate to spill to memory a value
> > > returned in registers.  Can we try to use emit_group_move_into_temps here
> > > instead (under the appropriate circumstances)?
> > 
> > It would be nice if the expander does not spill the return into memory in the
> > first place if possible.  On other hand tagging compiler created memory
> > location with temp decls so that aliaser has the symbolic information seems a
> > useful mechanism. 
> 
> Sure - but I wonder why assign_temp doesn't do something equivalent
> that doesn't require a automatic VAR_DECL to be created.
> 
> Where does the aliaser catch things with the VAR_DECL around that
> it doesn't without it?

Is it just that when I create a VAR_DECL, TREE_ADDRESSABLE is false and may_be_aliased returns true?

> Richard.
Comment 28 rguenther@suse.de 2011-06-17 08:21:30 UTC
On Thu, 16 Jun 2011, eraman at google dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194
> 
> --- Comment #27 from Easwaran Raman <eraman at google dot com> 2011-06-16 17:14:38 UTC ---
> (In reply to comment #26)
> > On Wed, 15 Jun 2011, xinliangli at gmail dot com wrote:
> > 
> > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194
> > > 
> > > davidxl <xinliangli at gmail dot com> changed:
> > > 
> > >            What    |Removed                     |Added
> > > ----------------------------------------------------------------------------
> > >                  CC|                            |xinliangli at gmail dot com
> > > 
> > > --- Comment #23 from davidxl <xinliangli at gmail dot com> 2011-06-15 23:14:50 UTC ---
> > > (In reply to comment #22)
> > > > > The DSE patch still leaves 2 redundant stores.
> > > > 
> > > > OK, I missed this, reopening...
> > > > 
> > > > > The following patch will enable DSE to remove those two stores. Does this
> > > > > look ok?
> > > > 
> > > > Calling into the gimplifier from the RTL expander doesn't look appropriate.
> > 
> > It also should use create_tmp_var, not create_tmp_reg.  But I wonder why
> > memory allocated via assign_temp isn't marked in a way to let dse
> > do its job (I guess dse thinks that memory escapes?).
> If the mem rtx doesn't have a tree_expression associated with it, DSE assumes
> the memory escapes.

Hmm, ok.  I guess it can't really do better.

> > > > More fundamentally, it's a little unfortunate to spill to memory a value
> > > > returned in registers.  Can we try to use emit_group_move_into_temps here
> > > > instead (under the appropriate circumstances)?
> > > 
> > > It would be nice if the expander does not spill the return into memory in the
> > > first place if possible.  On other hand tagging compiler created memory
> > > location with temp decls so that aliaser has the symbolic information seems a
> > > useful mechanism. 
> > 
> > Sure - but I wonder why assign_temp doesn't do something equivalent
> > that doesn't require a automatic VAR_DECL to be created.
> > 
> > Where does the aliaser catch things with the VAR_DECL around that
> > it doesn't without it?
> 
> Is it just that when I create a VAR_DECL, TREE_ADDRESSABLE is false and
> may_be_aliased returns true?

false, yes.

Richard.
Comment 29 davidxl 2011-06-18 09:05:10 UTC
Is Easwaran's patch a reasonable way to go?

David


(In reply to comment #28)
> On Thu, 16 Jun 2011, eraman at google dot com wrote:
> 
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194
> > 
> > --- Comment #27 from Easwaran Raman <eraman at google dot com> 2011-06-16 17:14:38 UTC ---
> > (In reply to comment #26)
> > > On Wed, 15 Jun 2011, xinliangli at gmail dot com wrote:
> > > 
> > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194
> > > > 
> > > > davidxl <xinliangli at gmail dot com> changed:
> > > > 
> > > >            What    |Removed                     |Added
> > > > ----------------------------------------------------------------------------
> > > >                  CC|                            |xinliangli at gmail dot com
> > > > 
> > > > --- Comment #23 from davidxl <xinliangli at gmail dot com> 2011-06-15 23:14:50 UTC ---
> > > > (In reply to comment #22)
> > > > > > The DSE patch still leaves 2 redundant stores.
> > > > > 
> > > > > OK, I missed this, reopening...
> > > > > 
> > > > > > The following patch will enable DSE to remove those two stores. Does this
> > > > > > look ok?
> > > > > 
> > > > > Calling into the gimplifier from the RTL expander doesn't look appropriate.
> > > 
> > > It also should use create_tmp_var, not create_tmp_reg.  But I wonder why
> > > memory allocated via assign_temp isn't marked in a way to let dse
> > > do its job (I guess dse thinks that memory escapes?).
> > If the mem rtx doesn't have a tree_expression associated with it, DSE assumes
> > the memory escapes.
> 
> Hmm, ok.  I guess it can't really do better.
> 
> > > > > More fundamentally, it's a little unfortunate to spill to memory a value
> > > > > returned in registers.  Can we try to use emit_group_move_into_temps here
> > > > > instead (under the appropriate circumstances)?
> > > > 
> > > > It would be nice if the expander does not spill the return into memory in the
> > > > first place if possible.  On other hand tagging compiler created memory
> > > > location with temp decls so that aliaser has the symbolic information seems a
> > > > useful mechanism. 
> > > 
> > > Sure - but I wonder why assign_temp doesn't do something equivalent
> > > that doesn't require a automatic VAR_DECL to be created.
> > > 
> > > Where does the aliaser catch things with the VAR_DECL around that
> > > it doesn't without it?
> > 
> > Is it just that when I create a VAR_DECL, TREE_ADDRESSABLE is false and
> > may_be_aliased returns true?
> 
> false, yes.
> 
> Richard.
Comment 30 Eric Botcazou 2011-06-18 09:27:36 UTC
> Is Easwaran's patch a reasonable way to go?

So, in the end, you weren't talking about the general situation, were you?  Let's try first to avoid spilling to memory, if possible.
Comment 31 davidxl 2011-06-18 16:32:32 UTC
(In reply to comment #30)
> > Is Easwaran's patch a reasonable way to go?
> 
> So, in the end, you weren't talking about the general situation, were you? 
> Let's try first to avoid spilling to memory, if possible.


I think these two are totally independent of each other -- one should not be gated against each other. If Eawaran's approach is completely flawed, that is different story.  With this change, we at least make incremental improvement.   Not familiar with the rtl expander, but I guess the spilling was there probably for a deeper reason. If you have an insight, you can of course point it out. Again, I think these two things are not in conflict.

David
Comment 32 rguenther@suse.de 2011-06-20 09:22:31 UTC
On Sat, 18 Jun 2011, xinliangli at gmail dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44194
> 
> --- Comment #29 from davidxl <xinliangli at gmail dot com> 2011-06-18 09:05:10 UTC ---
> Is Easwaran's patch a reasonable way to go?

The approach looks reasonable (with create_tmp_var instead of _reg),
whether it is ok in the calls.c context I didn't look at.

Richard.
Comment 33 Eric Botcazou 2011-06-20 15:27:06 UTC
> I think these two are totally independent of each other -- one should not be
> gated against each other. If Eawaran's approach is completely flawed, that is
> different story.  With this change, we at least make incremental improvement.  
> Not familiar with the rtl expander, but I guess the spilling was there probably
> for a deeper reason. If you have an insight, you can of course point it out.

See comment #22.  It's an incremental improvement, but maybe we can avoid wasting time and memory by creating RTXes and Trees that will be thrown away immediately after.  I don't really see what we risk by trying.
Comment 34 davidxl 2011-06-20 16:25:04 UTC
(In reply to comment #33)
> > I think these two are totally independent of each other -- one should not be
> > gated against each other. If Eawaran's approach is completely flawed, that is
> > different story.  With this change, we at least make incremental improvement.  
> > Not familiar with the rtl expander, but I guess the spilling was there probably
> > for a deeper reason. If you have an insight, you can of course point it out.
> 
> See comment #22.  It's an incremental improvement, but maybe we can avoid
> wasting time and memory by creating RTXes and Trees that will be thrown away
> immediately after.  I don't really see what we risk by trying.

yes, of course -- since you have explicit suggestion, it can be tried. Easwaran might have looked into this more ..

David
Comment 35 Easwaran Raman 2011-06-20 16:51:18 UTC
(In reply to comment #33)
> > I think these two are totally independent of each other -- one should not be
> > gated against each other. If Eawaran's approach is completely flawed, that is
> > different story.  With this change, we at least make incremental improvement.  
> > Not familiar with the rtl expander, but I guess the spilling was there probably
> > for a deeper reason. If you have an insight, you can of course point it out.
> 
> See comment #22.  It's an incremental improvement, but maybe we can avoid
> wasting time and memory by creating RTXes and Trees that will be thrown away
> immediately after.  I don't really see what we risk by trying.

There is a comment in calls.c that says
       /* Handle calls that return values in multiple non-contiguous locations.
	 The Irix 6 ABI has examples of this.  */

I don't know if avoiding the copy breaks that ABI in any way so I didn't try that approach. In general, if the TARGET is non-NULL, I don't see how the copy can be avoided (but then, the tree EXPR corresponding to the target hopefully has the addressable flag set). In this particular case though TARGET is NULL. Is it just a matter of setting  VALREG  and let expand_assignment deal with it?

Irrespective of how this case is handled, I think there may be other instances where a store generated during expansion may be redundant, but we don't know it at the point of generation. In such cases, is this approach of associating a tree expr with the temp rtx generated by the expanded reasonable?
Comment 36 Eric Botcazou 2011-06-22 07:57:28 UTC
> There is a comment in calls.c that says
>        /* Handle calls that return values in multiple non-contiguous locations.
>      The Irix 6 ABI has examples of this.  */
> 
> I don't know if avoiding the copy breaks that ABI in any way so I didn't try
> that approach. In general, if the TARGET is non-NULL, I don't see how the copy
> can be avoided (but then, the tree EXPR corresponding to the target hopefully
> has the addressable flag set). In this particular case though TARGET is NULL.
> Is it just a matter of setting  VALREG  and let expand_assignment deal with it?

It isn't a matter of avoiding the copy, it is matter of avoiding to spill the copy to memory if possible, i.e. to copy to pseudo registers instead.  There may be prerequisites.  See comment #22 for a possible approach.

> Irrespective of how this case is handled, I think there may be other instances
> where a store generated during expansion may be redundant, but we don't know it
> at the point of generation. In such cases, is this approach of associating a
> tree expr with the temp rtx generated by the expanded reasonable?

Probably, yes, albeit as a last resort solution.
Comment 37 Eric Botcazou 2011-07-07 14:56:49 UTC
*** Bug 49646 has been marked as a duplicate of this bug. ***
Comment 38 Eric Botcazou 2011-12-05 17:48:23 UTC
Taking it over.
Comment 39 Richard Biener 2012-03-22 08:27:03 UTC
GCC 4.7.0 is being released, adjusting target milestone.
Comment 40 Uroš Bizjak 2012-03-22 17:08:54 UTC
(In reply to comment #38)
> Taking it over.

Any plans w.r.t the fix to this problem?

This PR still fails on alphaec68-unknown-linux-gnu [1], I will be glad to test any proposed change(s).

[1] http://gcc.gnu.org/ml/gcc-testresults/2012-03/msg02428.html
Comment 41 Eric Botcazou 2012-03-22 18:17:59 UTC
> Any plans w.r.t the fix to this problem?

Yes, I have plans, but less time unfortunately.  But I'll definitely tackle it during this stage #1.
Comment 42 Chip Salzenberg 2012-08-06 01:22:43 UTC
Is bug #28831 a dup of this?
Comment 43 Eric Botcazou 2012-09-12 22:30:41 UTC
> Is bug #28831 a dup of this?

Not exactly, PR middle-end/28831 is a generic problem while this one is specific to architectures that can return structures in registers.
Comment 44 Chip Salzenberg 2012-09-12 23:21:21 UTC
Note that the x86 target has been changed in svn to use TImode for 128-bit structures, and structures bigger than 128 bits may not be passed in registers, so triggering this bug may be quite different now.
Comment 45 Eric Botcazou 2012-09-12 23:59:03 UTC
> Note that the x86 target has been changed in svn to use TImode for 128-bit
> structures, and structures bigger than 128 bits may not be passed in registers,
> so triggering this bug may be quite different now.

That's largely orthogonal, not all structures can have an integer mode.
Comment 46 Eric Botcazou 2012-09-14 13:28:52 UTC
Author: ebotcazou
Date: Fri Sep 14 13:28:44 2012
New Revision: 191302

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=191302
Log:
	PR rtl-optimization/44194
	* calls.c (expand_call): In the PARALLEL case, copy the return value
	into pseudos instead of spilling it onto the stack.
	* emit-rtl.c (adjust_address_1): Rename ADJUST into ADJUST_ADDRESS and
	add new ADJUST_OBJECT parameter.
	If ADJUST_OBJECT is set, drop the underlying object if it cannot be
	proved that the adjusted memory access is still within its bounds.
	(adjust_automodify_address_1): Adjust call to adjust_address_1.
	(widen_memory_access): Likewise.
	* expmed.c (store_bit_field_1): Call adjust_bitfield_address instead
	of adjust_address.  Do not drop the underlying object of a MEM.
	(store_fixed_bit_field): Likewise.
	(extract_bit_field_1): Likewise.  Fix oversight in recursion.
	(extract_fixed_bit_field): Likewise.
	* expr.h (adjust_address_1): Adjust prototype.
	(adjust_address): Adjust call to adjust_address_1.
	(adjust_address_nv): Likewise.
	(adjust_bitfield_address): New macro.
	(adjust_bitfield_address_nv): Likewise.
	* expr.c (expand_assignment): Handle a PARALLEL in more cases.
	(store_expr): Likewise.
	(store_field): Likewise.

	* dse.c: Fix typos in the head comment.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/calls.c
    trunk/gcc/dse.c
    trunk/gcc/emit-rtl.c
    trunk/gcc/expmed.c
    trunk/gcc/expr.c
    trunk/gcc/expr.h
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/pr44194-1.c
Comment 47 Eric Botcazou 2012-09-14 13:37:40 UTC
At long last.
Comment 48 Chip Salzenberg 2012-09-14 17:23:08 UTC
May Shub-Internet not see you as you pass.
Comment 49 Eric Botcazou 2012-10-21 12:36:21 UTC
Author: ebotcazou
Date: Sun Oct 21 12:36:16 2012
New Revision: 192651

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192651
Log:
	PR rtl-optimization/44194
	* calls.c (expand_call): Allow sibling calls in the PARALLEL case.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/calls.c
Comment 50 eraman 2012-11-02 00:28:45 UTC
Author: eraman
Date: Fri Nov  2 00:28:40 2012
New Revision: 193085

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=193085
Log:
	Backport 191302 and 192651 from trunk:

	2012-09-14  Eric Botcazou  <ebotcazou@adacore.com>
	
		PR rtl-optimization/44194
		* calls.c (expand_call): In the PARALLEL case, copy the return value
		into pseudos instead of spilling it onto the stack.
		* emit-rtl.c (adjust_address_1): Rename ADJUST into ADJUST_ADDRESS and
		add new ADJUST_OBJECT parameter.
		If ADJUST_OBJECT is set, drop the underlying object if it cannot be
		proved that the adjusted memory access is still within its bounds.
		(adjust_automodify_address_1): Adjust call to adjust_address_1.
		(widen_memory_access): Likewise.
		* expmed.c (store_bit_field_1): Call adjust_bitfield_address instead
		of adjust_address.  Do not drop the underlying object of a MEM.
		(store_fixed_bit_field): Likewise.
		(extract_bit_field_1): Likewise.  Fix oversight in recursion.
		(extract_fixed_bit_field): Likewise.
		* expr.h (adjust_address_1): Adjust prototype.
		(adjust_address): Adjust call to adjust_address_1.
		(adjust_address_nv): Likewise.
		(adjust_bitfield_address): New macro.
		(adjust_bitfield_address_nv): Likewise.
		* expr.c (expand_assignment): Handle a PARALLEL in more cases.
		(store_expr): Likewise.
		(store_field): Likewise.
	
		* dse.c: Fix typos in the head comment.
	2012-10-21  Eric Botcazou  <ebotcazou@adacore.com>
	
		PR rtl-optimization/44194
		* calls.c (expand_call): Allow sibling calls in the PARALLEL case.

gcc/testsuite/ChangeLog:
	2012-09-14  Eric Botcazou  <ebotcazou@adacore.com>
		* gcc.dg/pr44194-1.c: Check that there are no memory accesses left.


Modified:
    branches/google/gcc-4_7/gcc/ChangeLog.google-4_7
    branches/google/gcc-4_7/gcc/calls.c
    branches/google/gcc-4_7/gcc/dse.c
    branches/google/gcc-4_7/gcc/emit-rtl.c
    branches/google/gcc-4_7/gcc/expmed.c
    branches/google/gcc-4_7/gcc/expr.c
    branches/google/gcc-4_7/gcc/expr.h
    branches/google/gcc-4_7/gcc/testsuite/ChangeLog.google-4_7
    branches/google/gcc-4_7/gcc/testsuite/gcc.dg/pr44194-1.c