Bug 27986 - [4.2 Regression] jump to middle of loop on entry with using old version of an variable
Summary: [4.2 Regression] jump to middle of loop on entry with using old version of an...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.2.0
: P2 normal
Target Milestone: 4.3.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2006-06-10 22:23 UTC by dean
Modified: 2009-03-30 16:57 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work: 3.4.0 4.3.4 4.4.0
Known to fail: 4.0.0 4.0.4 4.1.0 4.2.0 4.2.5
Last reconfirmed: 2007-08-06 14:59:51


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description dean 2006-06-10 22:23:53 UTC
in the following code gcc choses the registers in such a way that it causes itself an extra copy for every loop iteration and has to jump past the copy to start the loop... it's probably easiest to describe just by looking at the code:

% cat jmp-over.c
void foo(int *v, int *d, int g)
{
  int s = v[1];
  int s0;
  do {
    s0 = s;
    s += *d;
    ++d;
  } while (s < g);
  v[0] = s0;
}

% gcc -g -O3 -Wall   -c -o jmp-over.o jmp-over.c
% objdump -dr jmp-over.o

jmp-over.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <foo>:
   0:   8b 4f 04                mov    0x4(%rdi),%ecx
   3:   eb 02                   jmp    7 <foo+0x7>
   5:   89 c1                   mov    %eax,%ecx
   7:   89 c8                   mov    %ecx,%eax
   9:   03 06                   add    (%rsi),%eax
   b:   48 83 c6 04             add    $0x4,%rsi
   f:   39 d0                   cmp    %edx,%eax
  11:   7c f2                   jl     5 <foo+0x5>
  13:   89 0f                   mov    %ecx,(%rdi)
  15:   c3                      retq

the jump-over is unnecessary...

        mov    0x4(%rdi),%ecx
1:      mov    %ecx,%eax
        add    (%rsi),%ecx
        add    $0x4,%rsi
        cmp    %edx,%ecx
        jl     1b
        mov    %eax,(%rdi)
        retq

-dean

% gcc -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../gcc/configure --prefix=/home/odo/gcc --host=x86_64-linux-gnu --disable-multilib --enable-languages=c,c++ --disable-bootstrap
Thread model: posix
gcc version 4.2.0 20060603 (experimental)
Comment 1 Andrew Pinski 2006-06-10 22:32:01 UTC
PPC gives:
_foo:
        lwz r2,4(r3)
        b L2
L7:
        mr r2,r0
L2:
        lwz r0,0(r4)
        addi r4,r4,4
        add r0,r2,r0
        cmpw cr7,r0,r5
        blt+ cr7,L7
        stw r2,0(r3)
        blr

The tree level is:
<bb 2>:
  s = *(v + 4B);

<L0>:;
  s.31 = s + MEM[base: d];
  d = d + 4B;
  if (s.31 < g) goto <L6>; else goto <L1>;

<L6>:;
  s = s.31;
  goto <bb 3> (<L0>);

<L1>:;
  *v = s;
Comment 2 Andrew Pinski 2006-09-03 07:18:51 UTC
Confirmed, Out of SSA should have created split the variable's range so that the conflicting parts are not changed inside the loop.
before Out of SSA:
  # s_2 = PHI <s_5(0), s_9(1)>;
  # d_1 = PHI <d_6(0), d_10(1)>;
<L0>:;
  D.1287_8 = MEM[base: d_1];
  s_9 = s_2 + D.1287_8;
  d_10 = d_1 + 4B;
  if (s_9 < g_11) goto <L0>; else goto <L1>;

  # s_7 = PHI <s_2(1)>;
<L1>:;
  *v_3 = s_7;


We should have changed s_7 into s.1 and added a move inside the loop before s_2 gets clobbered.

I don't know the way of fixing out of SSA though.
Comment 3 Steven Bosscher 2006-09-30 09:25:46 UTC
Typically something I'd hope the new out-of-ssa pass would improve.
Comment 4 Andrew Macleod 2006-10-02 13:56:04 UTC
This is not something out of ssa can resolve on its own.

given this sequence:
  # s_2 = PHI <s_5(0), s_9(1)>;
  # d_1 = PHI <d_6(0), d_10(1)>;
<L0>:;
  D.1287_8 = MEM[base: d_1];
  s_9 = s_2 + D.1287_8;          <<<---  s_2 and s_9 have different values
  d_10 = d_1 + 4B;
  if (s_9 < g_11) goto <L0>; else goto <L1>;

  # s_7 = PHI <s_2(1)>;          <<<--- original value of s_2 used here
<L1>:;
  *v_3 = s_7;

When s_9 is assigned s_2 + D.1287_8  , they have different values, so when s_2 is used in the PHI assigned to s_7, there is a stretch over which s_9 and s_2 are both live, and contain different values.

out of ssa cannot assign them to the same variable.

you are going to have a copy in the loop no matter what you do... 

right now we get:
<bb 2>:
  s = *(v + 4B);

<L0>:;
  s.31 = s + MEM[base: d]{*d};
  d = d + 4B;
  if (s.31 < g) goto <L6>; else goto <L1>;

<L6>:;
  s = s.31;
  goto <bb 3> (<L0>);

<L1>:;
  *v = s;
  return;


the other alternative would be:
<L0>:;
  s.31 = s
  s = s + MEM[base: d]{*d};
  d = d + 4B;
  if (s < g) goto <L6>; else goto <L1>;

<L6>:;
  goto <bb 3> (<L0>);

<L1>:;
  *v = s.31;


Is that any better? It looks pretty much the same to me.  It might help on a 2 address machine where you need r1 = r1 + exp, but that about it.


so Im not sure I understand what you want out of ssa to do with the code here.
Comment 5 Andrew Macleod 2006-10-02 14:01:26 UTC
I guess you can flatten the goto slightly. 

this is still something that is not really in the scope of out of ssa.  part of the root of this problem is that the PHI is really just a copy, but the fact that it remains a PHI prevents anyone from moving it. 

Its possible that we might be able to do something about it with some of the RABLET work. it doesnt really reduce any register pressure, but we might be able to recoginze that it flatten the cfg a bit... I'm not crazy about putting it there tho.

Of course, any optimization could do the same thing. A quick pass to look for these cases and transform those PHIs into copies at the appropriate place would accomplish the same thing.
Comment 6 Steven Bosscher 2006-10-02 21:46:34 UTC
Re comment #5: "A quick pass to look for these cases and transform those PHIs into copies at the appropriate place would accomplish the same thing."

That is exactly what I'd expect the out-of-ssa pass to take care of.  Much like the insert_backedge_copies() stuff in the current out-of-ssa pass.

Maybe I'm expecting too much, tho...
Comment 7 Andrew Macleod 2006-10-02 22:13:54 UTC
Its not that you are expecting too much, just in the wrong place from my point of view :-)  Changing the out of ssa algorithm or implementation isnt going to change this code. It requires changing the code out of ssa sees.  

insert_backedge_copies() ought to be its own pass as well, as far as Im concerned. Its the very first thing called, and is not related to the rest of what out of ssa does at all.  

It should be moved out, and we can have an optimization pass just before out-of-ssa which looks for this kind of thing.  That can happen when I start adding the pre-passes for register pressure work if no one does it earlier.
Comment 8 Gabriel Dos Reis 2007-02-03 17:36:33 UTC
Won't fix in GCC-4.0.x.  Adjusting milestone.
Comment 9 Mark Mitchell 2007-03-10 01:40:51 UTC
In this message:

http://gcc.gnu.org/ml/gcc/2007-03/msg00249.html

Andre Macleod indicates that this will be difficult to fix in pre-4.3 compilers.
Comment 10 davidxl 2008-02-19 06:00:44 UTC
(In reply to comment #0)
> in the following code gcc choses the registers in such a way that it causes
> itself an extra copy for every loop iteration and has to jump past the copy to
> start the loop... it's probably easiest to describe just by looking at the
> code:
> 
> % cat jmp-over.c
> void foo(int *v, int *d, int g)
> {
>   int s = v[1];
>   int s0;
>   do {
>     s0 = s;
>     s += *d;
>     ++d;
>   } while (s < g);
>   v[0] = s0;
> }
> 
> % gcc -g -O3 -Wall   -c -o jmp-over.o jmp-over.c
> % objdump -dr jmp-over.o
> 
> jmp-over.o:     file format elf64-x86-64
> 
> Disassembly of section .text:
> 
> 0000000000000000 <foo>:
>    0:   8b 4f 04                mov    0x4(%rdi),%ecx
>    3:   eb 02                   jmp    7 <foo+0x7>
>    5:   89 c1                   mov    %eax,%ecx
>    7:   89 c8                   mov    %ecx,%eax
>    9:   03 06                   add    (%rsi),%eax
>    b:   48 83 c6 04             add    $0x4,%rsi
>    f:   39 d0                   cmp    %edx,%eax
>   11:   7c f2                   jl     5 <foo+0x5>
>   13:   89 0f                   mov    %ecx,(%rdi)
>   15:   c3                      retq
> 
> the jump-over is unnecessary...
> 
>         mov    0x4(%rdi),%ecx
> 1:      mov    %ecx,%eax
>         add    (%rsi),%ecx
>         add    $0x4,%rsi
>         cmp    %edx,%ecx
>         jl     1b
>         mov    %eax,(%rdi)
>         retq
> 
> -dean
> 
> % gcc -v
> Using built-in specs.
> Target: x86_64-linux-gnu
> Configured with: ../gcc/configure --prefix=/home/odo/gcc
> --host=x86_64-linux-gnu --disable-multilib --enable-languages=c,c++
> --disable-bootstrap
> Thread model: posix
> gcc version 4.2.0 20060603 (experimental)
> 


// David Li

Note that assignment of s0 = s in the loop is mostly dead except for the last occurence. So it should be optimized into:

do {
   s += *d;
   ++d;
} while (s < g);
v[0] = (s-*(d-1));


Comment 11 dean 2008-02-19 17:42:53 UTC
Subject: Re:  [4.0/4.1/4.2/4.3 Regression] jump to
 middle of loop on entry with using old version of an variable

On Mon, 19 Feb 2008, xinliangli at gmail dot com wrote:

> Note that assignment of s0 = s in the loop is mostly dead except for the last
> occurence. So it should be optimized into:
> 
> do {
>    s += *d;
>    ++d;
> } while (s < g);
> v[0] = (s-*(d-1));

fwiw the real loop i pulled this fragment from has a bit more dependent on 
s0 ... i just used this fragment because it showcased a particular problem 
i observed in the full loop.  i can't include the full loop because it's 
in proprietary code :(

-dean
Comment 12 Joseph S. Myers 2008-07-04 21:22:50 UTC
Closing 4.1 branch.
Comment 13 Uroš Bizjak 2009-02-12 13:59:35 UTC
GCC: (GNU) 4.4.0 20090129 (experimental) [trunk revision 143750]:

foo:
	movl	4(%rdi), %eax	#, s
.L2:
	movl	%eax, %ecx	# s, s.29
	addl	(%rsi), %ecx	#* d, s.29
	addq	$4, %rsi	#, d
	movl	%eax, %r8d	# s, s.30
	cmpl	%edx, %ecx	# g, s
	movl	%ecx, %eax	# s.29, s
	jl	.L2	#,
	movl	%r8d, (%rdi)	# s.30,* v
	ret

GCC: (GNU) 4.3.4 20090211 (prerelease) [gcc-4_3-branch revision 144096]:

foo:
	movl	4(%rdi), %ecx	#, s
.L2:
	movl	%ecx, %eax	# s, s.19
	addl	(%rsi), %eax	#* d, s.19
	addq	$4, %rsi	#, d
	movl	%ecx, %r8d	# s, s.20
	cmpl	%edx, %eax	# g, s
	movl	%eax, %ecx	# s.19, s
	jl	.L2	#,
	movl	%r8d, (%rdi)	# s.20,* v
	ret

Fixed for 4.3 and 4.4 branch.
Comment 14 Joseph S. Myers 2009-03-30 16:57:51 UTC
Closing 4.2 branch, fixed in 4.3.