Testcase: struct a { short t, t1;}; int f(struct a b) { return b.t; } Without -mstrict-align, we get: .L.f: srawi 3,3,16 extsw 3,3 blr But with we get: .L.f: stw 3,48(1) nop nop nop lha 3,48(1) extsw 3,3 blr
The problem is that we force the struct to be BLKmode from DImode with: /* If structure's known alignment is less than what the scalar mode would need, and it matters, then stick with BLKmode. */ if (TYPE_MODE (type) != BLKmode && STRICT_ALIGNMENT && ! (TYPE_ALIGN (type) >= BIGGEST_ALIGNMENT || TYPE_ALIGN (type) >= GET_MODE_ALIGNMENT (TYPE_MODE (type)))) { /* If this is the only reason this type is BLKmode, then don't force containing types to be BLKmode. */ TYPE_NO_FORCE_BLK (type) = 1; TYPE_MODE (type) = BLKmode; } Which causes us to store this to the stack all the time.
This is improved by http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00573.html But for: struct a { int t, t1; }; int f(struct a b) { return b.t + b.t1; } We still get a store/load.
(In reply to comment #2) > This is improved by > http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00573.html Also the store is still there: .L.f: srawi 0,3,16 stw 3,48(1) extsw 3,0 blr
The store is still there but there is no longer an extra load. The load was fixed for the testcase in comment #2 by: 2007-10-12 Richard Sandiford <rsandifo@nildram.co.uk> * dse.c (find_shift_sequence): Reinstate "<= UNITS_PER_WORD" condition.
Note comment #1 compiles to: srdi 0,3,32 std 3,48(1) add 3,3,0 extsw 3,3 blr
These stores to the stack are not really anything that dse can handle. It is good at removing stores addressed off the frame pointer that go dead when the function returns, but it must be more conservative with stack pointer based addresses. This is a result of the fact that gcc does not allow the program to play as many games with values accessed off of the frame as it does with values accessed off of the stack. In fact, i think it would be wrong for dse to remove these stores because (i believe) that the caller is allowed to see values accessed off the stack pointer when the function returns. On the other hand, if i am wrong about this, it would not be hard to teach dse to do some of the same tricks to stack based addresses as it currently does for frame based addresses but i think that i pressed iant pretty hard on this stuff when i was writing this pass and this was all he would allow me to do. I think that if this bug is going to be "cleared", the right way to go is do better analysis about generating those stores in the first place.
Subject: Re: -mstrict-align can an extra for struct agrument passing zadeck at naturalbridge dot com wrote: > ------- Comment #6 from zadeck at naturalbridge dot com 2007-10-29 14:09 ------- > These stores to the stack are not really anything that dse can handle. It is > good at removing stores addressed off the frame pointer that go dead when the > function returns, but it must be more conservative with stack pointer based > addresses. > > This is a result of the fact that gcc does not allow the program to play as > many games with values accessed off of the frame as it does with values > accessed off of the stack. In fact, i think it would be wrong for dse to > remove these stores because (i believe) that the caller is allowed to see > values accessed off the stack pointer when the function returns. > > On the other hand, if i am wrong about this, it would not be hard to teach dse > to do some of the same tricks to stack based addresses as it currently does for > frame based addresses but i think that i pressed iant pretty hard on this stuff > when i was writing this pass and this was all he would allow me to do. > > I think that if this bug is going to be "cleared", the right way to go is do > better analysis about generating those stores in the first place. > > > ian t and i discussed this earlier tonight. ian does not believe that the solution that we will pursue for this bug merits inclusion into stage 3 so i am writing here so we can come back for it when stage 1 opens. one of the features that is encoded into the current implementation of dse is the ability to apply very aggressive treatment to addresses that have been given their own magical alias sets. We use these magical alias sets in the current dse to mark the stack locations that reload uses during spilling as being known never to be aliased. This allows us to bypass the currently conservative aliasing code and treat these locations as if they have completely transparent semantics, which in fact they do. our plan is to mark the locations referenced off of virtual registers as having these magical alias sets iff there is no place in the code that takes the address of one of these locations. A quick scan at the start of pass_instantiate_virtual_regs should be able to identify the places where the address are taken. This should allow us to clear any dead when the function returns or globally redundant stores of the variables addressed off these virtual registers. There may be other places where these magical alias sets are applicable; if anyone wants to suggest some, we will look into it. Comments? kenny
So the problem with the stores here is that the base is arg_pointer_rtx which is still a frame pointer related offset. I think the same can be said is true of stack_pointer_rtx too. We only set frame_related for frame_pointer_rtx and hard_frame_pointer_rtx but arg_pointer_rtx will become a frame pointer later one too. The only issue is that there might not be correct dependencies with respect of arg_pointer_rtx. Kenny, Do you agree?
Andrew, What is your point here? 1) Is it your claim that anything that is arg_pointer_rtx related would automatically qualify as being safe enough to remove dead stores to? or 2) Is it your claim that if we could generalize the game proposed in comment #7 to cover the arg_pointer_rtx's also? Kenny
(In reply to comment #9) > Andrew, > > What is your point here? My point here is that currently we do: gi->frame_related = (base == frame_pointer_rtx) || (base == hard_frame_pointer_rtx); But if we change it to be: gi->frame_related = (base == frame_pointer_rtx) || (base == hard_frame_pointer_rtx) || (base == arg_pointer_rtx && fixed_regs[ARG_POINTER_REGNUM]); It would delete the store (at least in a 4.3 based compiler). arg_pointer_rtx is the incoming argument space so if it is a fixed register it will be also frame related and we can safely delete the stores to this space.
(In reply to comment #10) Hmm, this patch bootstraps just fine on the 4.3 branch but causes miscompares on the trunk (the 4.3 branch had checking on also when I bootstrapped it on i386-darwin8.11).
Fixed at least from GCC 4.9.4? $ /opt/at8.0/bin/gcc -O3 -c -S pr30271.c -mstrict-align $ cat pr30271.s .file "pr30271.c" .abiversion 2 .section ".toc","aw" .section ".text" .align 2 .p2align 4,,15 .globl f .type f, @function f: extsh 9,3 srawi 3,3,16 add 3,9,3 extsw 3,3 blr .long 0 .byte 0,0,0,0,0,0,0,0 .size f,.-f .ident "GCC: (GNU) 4.9.4 20150824 (Advance-Toolchain-at8.0) [ibm/gcc-4_9-branch, revision: 227153 merged from gcc-4_9-branch, revision 227151]" .section .note.GNU-stack,"",@progbits