The following code struct Glob { int f1, f2; int x; }; extern struct Glob g; int func(int, int*, int); void test() { int a = 0; int* b = &g.x; do { a = *b; } while (func(a, b, a) != 0); } // compilation options: -march=armv5te -fpic -mthumb-interwork -mthumb -Os is compiled to 36 bytes by GCC 4.2.1 and to 40 bytes bytes by GCC 4.3.1 and 4.4.0. GCC 4.2.1: test: push {r4, lr} ldr r3, .L7 ldr r2, .L7+4 .LPIC0: add r3, pc ldr r4, [r3, r2] .L2: ldr r2, [r4, #8] mov r1, r4 add r1, r1, #8 mov r0, r2 bl func cmp r0, #0 bne .L2 @ sp needed for prologue pop {r4, pc} .L8: .align 2 .L7: .word _GLOBAL_OFFSET_TABLE_-(.LPIC0+4) .word g(GOT) GCC 4.4.0: test: push {r4, r5, r6, lr} ldr r3, .L6 ldr r2, .L6+4 .LPIC0: add r3, pc ldr r4, [r3, r2] mov r5, r4 add r5, r5, #8 .L2: ldr r2, [r4, #8] mov r1, r5 mov r0, r2 bl func cmp r0, #0 bne .L2 @ sp needed for prologue pop {r4, r5, r6, pc} .L7: .align 2 .L6: .word _GLOBAL_OFFSET_TABLE_-(.LPIC0+4) .word g(GOT) Bisection shows that this was changed by CL http://gcc.gnu.org/viewcvs?view=rev&revision=123919 When running with -da -ftree-dump-all options, I see that first changed dump is regr.c.139r.loop2_invariant.
Created attachment 17664 [details] gcc-rev123918.regr.c.139r.loop2_invariant A dump of loop2_invariant phase with gcc rev123918
Created attachment 17665 [details] gcc-rev123919.regr.c.139r.loop2_invariant A dump of loop2_invariant phase from gcc rev123919
mov r5, r4 add r5, r5, #8 .L2: ldr r2, [r4, #8] mov r1, r5 Wait, r4+8 is the same as r5 here so loop invariant should have used it.
A more simple example of this issue: void func(int*); void test() { int a = 0; while (1) { func(&a); if (a > 12) break; } } GCC rev123918: push {lr} sub sp, sp, #12 mov r3, #0 str r3, [sp, #4] .L2: add r0, sp, #4 bl func ldr r3, [sp, #4] cmp r3, #12 ble .L2 add sp, sp, #12 @ sp needed for prologue pop {pc} GCC rev123919: test: push {r4, lr} sub sp, sp, #8 mov r3, #0 add r4, sp, #4 str r3, [sp, #4] .L2: mov r0, r4 bl func ldr r3, [sp, #4] cmp r3, #12 ble .L2 add sp, sp, #8 @ sp needed for prologue pop {r4, pc}
GCC 4.3.4 is being released, adjusting target milestone.
Working with "gcc (GCC) 4.5.0 20090808 (experimental) [trunk revision 150579]", I looked at what loop-invariant.c did to the test case. This is the "t.c.154r.loop2" RTL dump (obtained with -fdump-rtl-all-slim): ;; Function test (test) 3 NOTE_INSN_BASIC_BLOCK 6 r135:SI=unspec[const(unspec[const(unspec[0x0] 21+0x4)] 24)] 3 7 r135:SI=unspec[r135:SI,0x4,0x0] 4 8 use r135:SI 2 NOTE_INSN_FUNCTION_BEG L20: 5 NOTE_INSN_BASIC_BLOCK 9 r137:SI=unspec[`g'] 3 10 r136:SI=[r135:SI+r137:SI] REG_DEAD: r137:SI REG_EQUAL: `g' 11 r133:SI=[r136:SI+0x8] REG_EQUAL: [const(`g'+0x8)] ! Pay close attention here... 14 r138:SI=r136:SI+0x8 REG_DEAD: r136:SI REG_EQUAL: const(`g'+0x8) ! ...and here 15 r0:SI=r133:SI 16 r1:SI=r138:SI REG_DEAD: r138:SI REG_EQUAL: const(`g'+0x8) 17 r2:SI=r133:SI REG_DEAD: r133:SI 18 r0:SI=call [`func'] argc:0x0 REG_DEAD: r2:SI REG_DEAD: r1:SI 19 r134:SI=r0:SI REG_DEAD: r0:SI 21 pc={(r134:SI!=0x0)?L20:pc} REG_DEAD: r134:SI REG_BR_PROB: 0x238c 24 NOTE_INSN_BASIC_BLOCK The code has a load from the address "const(`g'+0x8)" and a load of the constant itself to a register: 11 r133:SI=[r136:SI+0x8] 14 r138:SI=r136:SI+0x8 loop-invariant.c correctly notices the constant in insn 14 can be moved and it moves the set to reg 138 from basic block 3 to basic block 2 (from the "t.c.156r.loop2_invariant" dump, all "deferring rescan insn*" lines removed): Set in insn 9 is invariant (0), cost 4, depends on Set in insn 10 is invariant (1), cost 4, depends on 0 Set in insn 14 is invariant (2), cost 4, depends on 1 Set in insn 16 is invariant (3), cost 0, depends on 2 Decided to move invariant 2 Decided to move invariant 1 Decided to move invariant 0 changing bb of uid 9 from 3 to 2 changing bb of uid 10 from 3 to 2 changing bb of uid 14 from 3 to 2 The resulting RTL dump is this: 3 NOTE_INSN_BASIC_BLOCK 6 r135:SI=unspec[const(unspec[const(unspec[0x0] 21+0x4)] 24)] 3 7 r135:SI=unspec[r135:SI,0x4,0x0] 4 8 use r135:SI 2 NOTE_INSN_FUNCTION_BEG 9 r140:SI=unspec[`g'] 3 10 r141:SI=[r135:SI+r140:SI] REG_DEAD: r137:SI REG_EQUAL: `g' 14 r142:SI=r141:SI+0x8 REG_DEAD: r136:SI REG_EQUAL: const(`g'+0x8) L20: 5 NOTE_INSN_BASIC_BLOCK 46 r137:SI=r140:SI 47 r136:SI=r141:SI 11 r133:SI=[r141:SI+0x8] ! Aha! Why not r133=[r142] here? REG_EQUAL: [const(`g'+0x8)] 48 r138:SI=r142:SI 15 r0:SI=r133:SI 16 r1:SI=r142:SI REG_DEAD: r138:SI REG_EQUAL: const(`g'+0x8) 17 r2:SI=r133:SI REG_DEAD: r133:SI 18 r0:SI=call [`func'] argc:0x0 REG_DEAD: r2:SI REG_DEAD: r1:SI 19 r134:SI=r0:SI REG_DEAD: r0:SI 21 pc={(r134:SI!=0x0)?L45:pc} REG_DEAD: r134:SI REG_BR_PROB: 0x238c L45: 44 NOTE_INSN_BASIC_BLOCK 24 NOTE_INSN_BASIC_BLOCK After loop-invariant.c, reg 142 is set to "const(`g'+0x8)" in basic block 2, before the loop: 14 r142:SI=r141:SI+0x8 But inside the loop, r142 is only used in insn 48 to set reg 138 (it was set in insn 14 before loop-invariant.c). The fact that r142="const(`g'+0x8)" is not used in insn 11: 11 r133:SI=[r141:SI+0x8] 48 r138:SI=r142:SI Somehow, loop-invariant has failed to notice or exploit the equivalence.
Created attachment 19755 [details] Hack to look for invariants inside MEMs Re. comment #4: The following code would also be correct, right? test: push {r4, lr} sub sp, sp, #8 mov r3, #0 add r4, sp, #4 str r3, [sp, #4] .L2: mov r0, r4 bl func ldr r3, [r4] cmp r3, #12 ble .L2 add sp, sp, #8 @ sp needed for prologue pop {r4, pc} .size test, .-test .ident "GCC: (GNU) 4.5.0 20100129 (experimental) [trunk revision 156352]" I have a really bad hack that makes loop-invariant look at addresses inside MEMs (attached). (Looking at Zdenek:) Would something like this in a more polished and actually verified&tested form be a good idea?
Subject: Re: [4.3/4.4/4.5 regression] extra spills due to RTL LICM > (Looking at Zdenek:) Would something like this in a more polished and actually > verified&tested form be a good idea? I think this could cause us to move the invariant addresses out of the loop unnecessarily. Note that moving "const(`g'+0x8)" out of the loop is only a good idea because it is used elsewhere -- if it were only used as the address, moving it out adds an unnecessary assignment. This could be handled by the patch by ignoring the address invariants in find_invariants_to_move. Otherwise, the idea looks fine to me (assuming that there are reasons why this is not done by fwprop), Zdenek
I think this is the same issue as PR 36758.
Actually looking at comment #4 shows that is an exact duplicate of PR 36758 so closing as a dup. *** This bug has been marked as a duplicate of 36758 ***
I just checked to see if this is still a problem. As of r162042, the example in comment #1 produces the same (bad) output as GCC 4.4.1. However, the example in comment #4 looks fixed to me, with this output: test: push {r0, r1, r2, lr} mov r3, #0 str r3, [sp, #4] .L2: add r0, sp, #4 bl func ldr r3, [sp, #4] cmp r3, #12 ble .L2 @ sp needed for prologue pop {r0, r1, r2, pc} As it was the latter test case that caused this to be marked as a duplicate of PR 36758, maybe the original test case is tripping over a different problem and needs to be re-examined?
I agree.
Does the prototype fix of http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36758#c21 help?
Yes, it looks like the prototype fix for PR 36758 fixes the test case at the top of this issue. The patch needs a little updating, though, and I can't say I grok the changes to the surrounding code sufficiently to be sure I've gotten it right.
Well, it's probably worth trying a little harder to grok them, then. Zdenek has already said that the fix looks OK in principle, but I am not interested at all in working on this patch anymore (especially not when others with an interest in the patch get paid for caring :-).
Brief explanation about what the patch does: * have a pointer to the location of the invariant within an rtx. The existing code assumes a complete RHS is invariant, but with the patch GCC can move invariants out of the RHS of a SET even if the whole RHS itself is not invariant. * Do not bail out in find_invariant_insn if the complete RHS is not invariant. Instead see if there is an address (i.e. XEXP (MEM, 0)) that is invariant. * Update invariant motion code to handle partially invariant RHSs One TODO for the patch (IIRC) is to only move an invariant address if it would otherwise be moved anyway. That is, only move the invariant address if there is a dependent invariant that would cause the address to be hoisted anyway. This is to avoid excessive LICM.
As a point of clarification, I am not getting paid to care about this issue either. :-) At this time I have no plans to continue working on it.
4.3 branch is being closed, moving to 4.4.7 target.
4.4 branch is being closed, moving to 4.5.4 target.
Fixed in 4.6.0, the 4.5 branch is being closed.