Bug 42919 - Argument unnecessarily spilled
Argument unnecessarily spilled
Status: NEW
Product: gcc
Classification: Unclassified
Component: middle-end
4.5.0
: P3 enhancement
: ---
Assigned To: Not yet assigned to anyone
: missed-optimization
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-01-31 22:54 UTC by Richard Biener
Modified: 2010-02-02 13:35 UTC (History)
2 users (show)

See Also:
Host:
Target: i?86-*-*
Build:
Known to work: 2.95.2 3.4.6
Known to fail: 3.3.6 4.3.4 4.4.3 4.5.0 4.0.1
Last reconfirmed: 2010-01-31 22:56:04


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2010-01-31 22:54:01 UTC
void foo(unsigned long long *x);
void testfunc2(unsigned long long a) { foo (&a); }

generates

testfunc2:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $40, %esp
        movl    8(%ebp), %eax
        movl    %eax, -16(%ebp)
        movl    12(%ebp), %eax
        movl    %eax, -12(%ebp)
        leal    -16(%ebp), %eax
        movl    %eax, (%esp)
        call    foo
        leave
        ret

which is a lot worse to what compared to what 3.4 produced:

testfunc2:
        pushl   %ebp
        movl    %esp, %ebp
        leal    8(%ebp), %eax
        subl    $8, %esp
        movl    %eax, (%esp)
        call    foo
        leave
        ret
Comment 1 Richard Biener 2010-01-31 22:56:04 UTC
It also causes us to use more stack.
Comment 2 Andrew Pinski 2010-01-31 22:57:41 UTC
Weird that 3.3 fails but 3.4 works ...
Comment 3 Michael Matz 2010-02-01 00:21:23 UTC
3.4 had this as initial RTL:

(insn 3 2 4 (set (reg:DI 60)
        (mem/f:DI (reg/f:SI 53 virtual-incoming-args) [2 a+0 S8 A32])) -1 (nil)
    (expr_list:REG_EQUIV (mem/f:DI (reg/f:SI 53 virtual-incoming-args)
     [2 a+0 S8 A32])
        (nil)))

(insn 4 3 5 (set (mem/f:DI (addressof:SI (reg/v:DI 59) 58 0xf7c9a4a4)
                 [2 a+0 S8 A64])
        (reg:DI 60)) -1 (nil)
    (nil))

(insn 15 7 16 0 (parallel [
            (set (reg/f:SI 7 sp)
                (plus:SI (reg/f:SI 7 sp)
                    (const_int -12 [0xfffffff4])))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil)
    (nil))

(insn 16 15 17 0 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [0 S4 A32])
        (addressof:SI (reg/v:DI 59) 58 0xf7c9a4a4)) -1 (nil)
    (nil))

(call_insn 17 16 19 0 (call (mem:QI (symbol_ref:SI ("foo") [flags 0x41]
 <function_decl 0xf7c9a438 foo>) [0 S1 A8])
        (const_int 16 [0x10])) -1 (nil)
    (nil) (nil))

addressof RTL and handling was removed in 2004, but was our way of making this
optimization on RTL.  So this would now need to be implemented in expand
or something :-/  Well, or by implementing some more memory optimizations
on RTL (recognizing that mem1 has same content as mem2, and using &mem1 where
&mem2 was used and the former somehow is "better").  Not committing to MEM
to start with would be better, obviously.
Comment 4 Michael Matz 2010-02-01 02:41:36 UTC
Well, actually it depends.  The code generated by 3.4 might theoretically
be slower, because it potentially uses a misaligned stack slot (incoming
stack with -m32 only 4 byte aligned).  With -mpreferred-stack-boundary=2 also
newer compilers use the incoming stack slot instead of copying it around.

When we determined we need a stack slot, the flow is like so: when the slot
alignment (32bits) is not known to be enough for it's declared type (64 bits
here), _and_ the preferred stack alignment (128 per default in new compilers)
is larger than the known slot alignment, then allocate a new stack slot.

"allocate a new stack slot" is what differs between old and new compilers.
New compilers will simply, well, allocate a new slot :)  Old compilers will
only use ADDRESSOF (if the type itself can otherwise be placed into
registers), a kind of deferred stack slot allocation to wait if the 
address really needs to be taken (in new compilers this is always the case,
because we can rely on TREE_ADDRESSABLE).  If it turns out to be necessary,
then it will reuse the stack slot, possibly misaligned (and the latter could
be regarded as bug).

If the 3.4 compiler also would check for alignment in the new way (it only
did so for STRICT_ALIGNMENT targets), it too wouldn't have used the incoming
stack slot.

This additional checking (not only for STRICT_ALIGNMENT targets) came in
as fix for PR18916 (that was after ADDRESSOF was removed already, otherwise
that fix would have affected that code too).

So, I think, everything works as intended, as long as the alignment facts are
as they are:
* long long is 64 aligned
* incoming stack is 32 aligned
* preferred alignment is 128 (and that this matters seems fishy too)

One might argue that this should only matter for STRICT_ALIGNMENT targets,
and therefore that ppc (ref PR18916) is such target.  But that was altivec
code.  And with such code (SSE) x86 also is sort of a STRICT_ALIGNMENT target.
Comment 5 Michael Matz 2010-02-01 02:53:20 UTC
He, Alan already conjectured this problem:
http://gcc.gnu.org/ml/gcc-patches/2005-01/msg01182.html
Comment 6 rguenther@suse.de 2010-02-01 09:23:19 UTC
Subject: Re:  [4.3/4.4/4.5 Regression] Argument
 unnecessarily spilled

On Mon, 1 Feb 2010, matz at gcc dot gnu dot org wrote:

> ------- Comment #4 from matz at gcc dot gnu dot org  2010-02-01 02:41 -------
> Well, actually it depends.  The code generated by 3.4 might theoretically
> be slower, because it potentially uses a misaligned stack slot (incoming
> stack with -m32 only 4 byte aligned).  With -mpreferred-stack-boundary=2 also
> newer compilers use the incoming stack slot instead of copying it around.
> 
> When we determined we need a stack slot, the flow is like so: when the slot
> alignment (32bits) is not known to be enough for it's declared type (64 bits
> here), _and_ the preferred stack alignment (128 per default in new compilers)
> is larger than the known slot alignment, then allocate a new stack slot.
> 
> "allocate a new stack slot" is what differs between old and new compilers.
> New compilers will simply, well, allocate a new slot :)  Old compilers will
> only use ADDRESSOF (if the type itself can otherwise be placed into
> registers), a kind of deferred stack slot allocation to wait if the 
> address really needs to be taken (in new compilers this is always the case,
> because we can rely on TREE_ADDRESSABLE).  If it turns out to be necessary,
> then it will reuse the stack slot, possibly misaligned (and the latter could
> be regarded as bug).
> 
> If the 3.4 compiler also would check for alignment in the new way (it only
> did so for STRICT_ALIGNMENT targets), it too wouldn't have used the incoming
> stack slot.
> 
> This additional checking (not only for STRICT_ALIGNMENT targets) came in
> as fix for PR18916 (that was after ADDRESSOF was removed already, otherwise
> that fix would have affected that code too).
> 
> So, I think, everything works as intended, as long as the alignment facts are
> as they are:
> * long long is 64 aligned
> * incoming stack is 32 aligned
> * preferred alignment is 128 (and that this matters seems fishy too)
> 
> One might argue that this should only matter for STRICT_ALIGNMENT targets,
> and therefore that ppc (ref PR18916) is such target.  But that was altivec
> code.  And with such code (SSE) x86 also is sort of a STRICT_ALIGNMENT target.

Hm, if we'd properly communicate that misalignment down to accesses
it would for large structs or -Os still preferable to not do the
extra spilling.

Richard.
Comment 7 Michael Matz 2010-02-01 11:53:59 UTC
Or making STRICT_ALIGNMENT depend on the type or mode.  So that e.g.
"!align(8) && align(4) && !STRICT_ALIGN(long long)" would be acceptable,
but "!align(16) && STRICT_ALIGN(m128d)" would not.  Or making it depend
on the typesize, so that strictness would only come into play when the type is
larger than 8 byte.  Or something like that.

I don't think propagating down (mis)alignment would help in the general case.
Some instructions simply aren't possible with misaligned memory, and need
large compensation code.  In that case we also wouldn't want to use the
incoming stack slot for -Os.
Comment 8 Richard Biener 2010-02-02 13:35:04 UTC
After all this is more an enhancement request and not really a bug.