39837 – [4.5 regression] extra spills due to RTL LICM

Bug 39837 - [4.5 regression] extra spills due to RTL LICM

Summary: [4.5 regression] extra spills due to RTL LICM

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	4.4.0

Importance:	P2 normal
Target Milestone:	4.6.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:	36758
Blocks:	16996
	Show dependency tree / graph

Reported:	2009-04-21 16:43 UTC by Alexander Vodomerov
Modified:	2012-07-02 10:31 UTC (History)
CC List:	8 users (show)

See Also:
Host:
Target:	arm-eabi
Build:
Known to work:
Known to fail:
Last reconfirmed:	2010-01-02 00:42:32

Attachments
gcc-rev123918.regr.c.139r.loop2_invariant (1.15 KB, text/plain) 2009-04-21 16:45 UTC, Alexander Vodomerov	Details
gcc-rev123919.regr.c.139r.loop2_invariant (1.16 KB, text/plain) 2009-04-21 16:47 UTC, Alexander Vodomerov	Details
Hack to look for invariants inside MEMs (2.21 KB, text/plain) 2010-01-29 23:11 UTC, Steven Bosscher	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Alexander Vodomerov 2009-04-21 16:43:59 UTC

The following code

struct Glob
{
  int f1, f2;
  int x;
};
extern struct Glob g;
int func(int, int*, int);
void test()
{
  int a = 0;
  int* b = &g.x;
  do
    {
      a = *b;
    }
  while (func(a, b, a) != 0);
}
// compilation options: -march=armv5te -fpic -mthumb-interwork -mthumb -Os

is compiled to 36 bytes by GCC 4.2.1 and to 40 bytes bytes by GCC 4.3.1 and 4.4.0.

GCC 4.2.1:
test:
        push    {r4, lr}
        ldr     r3, .L7
        ldr     r2, .L7+4
.LPIC0:
        add     r3, pc
        ldr     r4, [r3, r2]
.L2:
        ldr     r2, [r4, #8]
        mov     r1, r4
        add     r1, r1, #8
        mov     r0, r2
        bl      func
        cmp     r0, #0
        bne     .L2
        @ sp needed for prologue
        pop     {r4, pc}
.L8:
        .align  2
.L7:
        .word   _GLOBAL_OFFSET_TABLE_-(.LPIC0+4)
        .word   g(GOT)

GCC 4.4.0:
test:
        push    {r4, r5, r6, lr}
        ldr     r3, .L6
        ldr     r2, .L6+4
.LPIC0:
        add     r3, pc
        ldr     r4, [r3, r2]
        mov     r5, r4
        add     r5, r5, #8
.L2:
        ldr     r2, [r4, #8]
        mov     r1, r5
        mov     r0, r2
        bl      func
        cmp     r0, #0
        bne     .L2
        @ sp needed for prologue
        pop     {r4, r5, r6, pc}
.L7:
        .align  2
.L6:
        .word   _GLOBAL_OFFSET_TABLE_-(.LPIC0+4)
        .word   g(GOT)

Bisection shows that this was changed by CL http://gcc.gnu.org/viewcvs?view=rev&revision=123919

When running with -da -ftree-dump-all options, I see that first changed dump is regr.c.139r.loop2_invariant.

Comment 1 Alexander Vodomerov 2009-04-21 16:45:58 UTC

Created attachment 17664 [details]
gcc-rev123918.regr.c.139r.loop2_invariant

A dump of loop2_invariant phase with gcc rev123918

Comment 2 Alexander Vodomerov 2009-04-21 16:47:00 UTC

Created attachment 17665 [details]
gcc-rev123919.regr.c.139r.loop2_invariant

A dump of loop2_invariant phase from gcc rev123919

Comment 3 Andrew Pinski 2009-04-21 16:49:03 UTC

       mov     r5, r4
       add     r5, r5, #8
.L2:
       ldr     r2, [r4, #8]
       mov     r1, r5

Wait, r4+8 is the same as r5 here so loop invariant should have used it.

Comment 4 Alexander Vodomerov 2009-04-23 16:39:42 UTC

A more simple example of this issue:

void func(int*);

void test()
{
  int a = 0;
  while (1) {
    func(&a);
    if (a > 12) break;
  }
}

GCC rev123918:
        push    {lr}
        sub     sp, sp, #12
        mov     r3, #0
        str     r3, [sp, #4]
.L2:
        add     r0, sp, #4
        bl      func
        ldr     r3, [sp, #4]
        cmp     r3, #12
        ble     .L2
        add     sp, sp, #12
        @ sp needed for prologue
        pop     {pc}

GCC rev123919:
test:
        push    {r4, lr}
        sub     sp, sp, #8
        mov     r3, #0
        add     r4, sp, #4
        str     r3, [sp, #4]
.L2:
        mov     r0, r4
        bl      func
        ldr     r3, [sp, #4]
        cmp     r3, #12
        ble     .L2
        add     sp, sp, #8
        @ sp needed for prologue
        pop     {r4, pc}

Comment 5 Richard Biener 2009-08-04 12:30:12 UTC

GCC 4.3.4 is being released, adjusting target milestone.

Comment 6 Steven Bosscher 2010-01-02 00:42:31 UTC

Working with "gcc (GCC) 4.5.0 20090808 (experimental) [trunk revision 150579]", I looked at what loop-invariant.c did to the test case.

This is the "t.c.154r.loop2" RTL dump (obtained with -fdump-rtl-all-slim):

;; Function test (test)

    3 NOTE_INSN_BASIC_BLOCK
    6 r135:SI=unspec[const(unspec[const(unspec[0x0] 21+0x4)] 24)] 3
    7 r135:SI=unspec[r135:SI,0x4,0x0] 4
    8 use r135:SI
    2 NOTE_INSN_FUNCTION_BEG
L20:
    5 NOTE_INSN_BASIC_BLOCK
    9 r137:SI=unspec[`g'] 3
   10 r136:SI=[r135:SI+r137:SI]
      REG_DEAD: r137:SI
      REG_EQUAL: `g'
   11 r133:SI=[r136:SI+0x8]
      REG_EQUAL: [const(`g'+0x8)]             ! Pay close attention here...
   14 r138:SI=r136:SI+0x8
      REG_DEAD: r136:SI
      REG_EQUAL: const(`g'+0x8)               ! ...and here
   15 r0:SI=r133:SI
   16 r1:SI=r138:SI
      REG_DEAD: r138:SI
      REG_EQUAL: const(`g'+0x8)
   17 r2:SI=r133:SI
      REG_DEAD: r133:SI
   18 r0:SI=call [`func'] argc:0x0
      REG_DEAD: r2:SI
      REG_DEAD: r1:SI
   19 r134:SI=r0:SI
      REG_DEAD: r0:SI
   21 pc={(r134:SI!=0x0)?L20:pc}
      REG_DEAD: r134:SI
      REG_BR_PROB: 0x238c
   24 NOTE_INSN_BASIC_BLOCK



The code has a load from the address "const(`g'+0x8)" and a load of the constant itself to a register:

   11 r133:SI=[r136:SI+0x8]
   14 r138:SI=r136:SI+0x8

loop-invariant.c correctly notices the constant in insn 14 can be moved and it moves the set to reg 138 from basic block 3 to basic block 2 (from the "t.c.156r.loop2_invariant" dump, all "deferring rescan insn*" lines removed):

Set in insn 9 is invariant (0), cost 4, depends on
Set in insn 10 is invariant (1), cost 4, depends on 0
Set in insn 14 is invariant (2), cost 4, depends on 1
Set in insn 16 is invariant (3), cost 0, depends on 2
Decided to move invariant 2
Decided to move invariant 1
Decided to move invariant 0
changing bb of uid 9
  from 3 to 2
changing bb of uid 10
  from 3 to 2
changing bb of uid 14
  from 3 to 2

The resulting RTL dump is this:

    3 NOTE_INSN_BASIC_BLOCK
    6 r135:SI=unspec[const(unspec[const(unspec[0x0] 21+0x4)] 24)] 3
    7 r135:SI=unspec[r135:SI,0x4,0x0] 4
    8 use r135:SI
    2 NOTE_INSN_FUNCTION_BEG
    9 r140:SI=unspec[`g'] 3
   10 r141:SI=[r135:SI+r140:SI]
      REG_DEAD: r137:SI
      REG_EQUAL: `g'
   14 r142:SI=r141:SI+0x8
      REG_DEAD: r136:SI
      REG_EQUAL: const(`g'+0x8)
L20:
    5 NOTE_INSN_BASIC_BLOCK
   46 r137:SI=r140:SI
   47 r136:SI=r141:SI
   11 r133:SI=[r141:SI+0x8]                   ! Aha! Why not r133=[r142] here?
      REG_EQUAL: [const(`g'+0x8)]
   48 r138:SI=r142:SI
   15 r0:SI=r133:SI
   16 r1:SI=r142:SI
      REG_DEAD: r138:SI
      REG_EQUAL: const(`g'+0x8)
   17 r2:SI=r133:SI
      REG_DEAD: r133:SI
   18 r0:SI=call [`func'] argc:0x0
      REG_DEAD: r2:SI
      REG_DEAD: r1:SI
   19 r134:SI=r0:SI
      REG_DEAD: r0:SI
   21 pc={(r134:SI!=0x0)?L45:pc}
      REG_DEAD: r134:SI
      REG_BR_PROB: 0x238c
L45:
   44 NOTE_INSN_BASIC_BLOCK
   24 NOTE_INSN_BASIC_BLOCK


After loop-invariant.c, reg 142 is set to "const(`g'+0x8)" in basic block 2, before the loop:

   14 r142:SI=r141:SI+0x8

But inside the loop, r142 is only used in insn 48 to set reg 138 (it was set in insn 14 before loop-invariant.c). The fact that r142="const(`g'+0x8)" is not used in insn 11:

   11 r133:SI=[r141:SI+0x8]
   48 r138:SI=r142:SI

Somehow, loop-invariant has failed to notice or exploit the equivalence.

Comment 7 Steven Bosscher 2010-01-29 23:11:37 UTC

Created attachment 19755 [details]
Hack to look for invariants inside MEMs

Re. comment #4:

The following code would also be correct, right?
test:
	push	{r4, lr}
	sub	sp, sp, #8
	mov	r3, #0
	add	r4, sp, #4
	str	r3, [sp, #4]
.L2:
	mov	r0, r4
	bl	func
	ldr	r3, [r4]
	cmp	r3, #12
	ble	.L2
	add	sp, sp, #8
	@ sp needed for prologue
	pop	{r4, pc}
	.size	test, .-test
	.ident	"GCC: (GNU) 4.5.0 20100129 (experimental) [trunk revision 156352]"

I have a really bad hack that makes loop-invariant look at addresses inside MEMs (attached).
(Looking at Zdenek:) Would something like this in a more polished and actually verified&tested form be a good idea?

Comment 8 rakdver@kam.mff.cuni.cz 2010-01-30 09:47:38 UTC

Subject: Re:  [4.3/4.4/4.5 regression] extra
	spills due to RTL LICM

> (Looking at Zdenek:) Would something like this in a more polished and actually
> verified&tested form be a good idea?

I think this could cause us to move the invariant addresses out of the loop
unnecessarily.  Note that moving "const(`g'+0x8)" out of the loop is only a
good idea because it is used elsewhere -- if it were only used as the address,
moving it out adds an unnecessary assignment.  This could be handled by the
patch by ignoring the address invariants in find_invariants_to_move.
Otherwise, the idea looks fine to me (assuming that there are reasons why this
is not done by fwprop),

Zdenek

Comment 9 Andrew Pinski 2010-03-02 19:59:31 UTC

I think this is the same issue as PR 36758.

Comment 10 Andrew Pinski 2010-03-02 20:10:27 UTC

Actually looking at comment #4 shows that is an exact duplicate of PR 36758 so closing as a dup.

*** This bug has been marked as a duplicate of 36758 ***

Comment 11 Sandra Loosemore 2010-07-10 21:07:23 UTC

I just checked to see if this is still a problem.  As of r162042, the example in comment #1 produces the same (bad) output as GCC 4.4.1. However, the example in comment #4 looks fixed to me, with this output:

test:
	push	{r0, r1, r2, lr}
	mov	r3, #0
	str	r3, [sp, #4]
.L2:
	add	r0, sp, #4
	bl	func
	ldr	r3, [sp, #4]
	cmp	r3, #12
	ble	.L2
	@ sp needed for prologue
	pop	{r0, r1, r2, pc}

As it was the latter test case that caused this to be marked as a duplicate of PR 36758, maybe the original test case is tripping over a different problem and needs to be re-examined?

Comment 12 Paolo Bonzini 2010-07-11 07:51:26 UTC

I agree.

Comment 13 Steven Bosscher 2010-07-11 11:40:30 UTC

Does the prototype fix of http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36758#c21 help?

Comment 14 Sandra Loosemore 2010-07-11 17:47:45 UTC

Yes, it looks like the prototype fix for PR 36758 fixes the test case at the top of this issue.  The patch needs a little updating, though, and I can't say I grok the changes to the surrounding code sufficiently to be sure I've gotten it right.

Comment 15 Steven Bosscher 2010-07-11 22:48:29 UTC

Well, it's probably worth trying a little harder to grok them, then. Zdenek has already said that the fix looks OK in principle, but I am not interested at all in working on this patch anymore (especially not when others with an interest in the patch get paid for caring :-).

Comment 16 Steven Bosscher 2010-07-11 22:55:04 UTC

Brief explanation about what the patch does:

* have a pointer to the location of the invariant within an rtx. The existing code assumes a complete RHS is invariant, but with the patch GCC can move invariants out of the RHS of a SET even if the whole RHS itself is not invariant.

* Do not bail out in find_invariant_insn if the complete RHS is not invariant. Instead see if there is an address (i.e. XEXP (MEM, 0)) that is invariant.

* Update invariant motion code to handle partially invariant RHSs


One TODO for the patch (IIRC) is to only move an invariant address if it would otherwise be moved anyway. That is, only move the invariant address if there is a dependent invariant that would cause the address to be hoisted anyway. This is to avoid excessive LICM.

Comment 17 Sandra Loosemore 2010-07-13 17:13:20 UTC

As a point of clarification, I am not getting paid to care about this issue either.  :-)  At this time I have no plans to continue working on it.

Comment 18 Richard Biener 2011-06-27 12:14:31 UTC

4.3 branch is being closed, moving to 4.4.7 target.

Comment 19 Jakub Jelinek 2012-03-13 12:47:58 UTC

4.4 branch is being closed, moving to 4.5.4 target.

Comment 20 Richard Biener 2012-07-02 10:31:53 UTC

Fixed in 4.6.0, the 4.5 branch is being closed.