Bug 30521 - "if (i == n) ++i;" or "i += i == n;"?
Summary: "if (i == n) ++i;" or "i += i == n;"?
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.1.1
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
: 37240 (view as bug list)
Depends on: 37239
Blocks: 21485 70546
  Show dependency treegraph
 
Reported: 2007-01-21 00:31 UTC by esigra
Modified: 2018-03-01 13:27 UTC (History)
4 users (show)

See Also:
Host:
Target: i?86-linux
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-03-03 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description esigra 2007-01-21 00:31:59 UTC
Suppose that we have a function f that can be written in 2 ways with identical result:
unsigned int f(unsigned int i, unsigned int n) {++i; if (i == n) ++i; return i;}
unsigned int f(unsigned int i, unsigned int n) {++i; i += i == n; return i;}

g++ -O3 produces different code for the 2 versions:
       pushl   %ebp
.LCFI0:
+       xorl    %edx, %edx
       movl    %esp, %ebp
.LCFI1:
-       movl    8(%ebp), %edx
-       leal    1(%edx), %eax
+       movl    8(%ebp), %eax
+       incl    %eax
       cmpl    12(%ebp), %eax
-       je      .L6
       popl    %ebp
-       ret
-       .p2align 4,,7
-.L6:
-       popl    %ebp
-       leal    2(%edx), %eax
+       sete    %dl
+       addl    %edx, %eax
       ret
.LFE2:

This implies that one of the following 3 statements holds:
1. The 2 versions of f are indeed not identical.
2. The 2 versions of the generated code are equally efficient, so the difference does not matter.
3. g++ generates suboptimal code for one of the versions of f.

An answer at [http://gcc.gnu.org/ml/gcc-help/2007-01/msg00253.html] suggests that statement 3 holds.
Comment 1 Andrew Pinski 2007-01-21 08:54:13 UTC
I think this has been fixed already, for PPC with 4.0.2, we get:
_f:
        mr r2,r3
        addi r3,r3,1
        cmpw cr7,r3,r4
        bnelr cr7
        addi r3,r2,2
        blr
        .align 2
        .p2align 4,,15
        .globl _f1
_f1:
        addi r3,r3,1
        xor r0,r3,r4
        subfic r0,r0,0
        addze r0,r3
        mr r3,r0
        blr


While on the trunk we get:
_f:
        addi r3,r3,1
        xor r0,r3,r4
        subfic r0,r0,0
        addze r0,r3
        mr r3,r0
        blr
        .align 2
        .globl _f1
_f1:
        addi r3,r3,1
        xor r0,r3,r4
        subfic r0,r0,0
        addze r0,r3
        mr r3,r0
        blr
Comment 2 Andrew Pinski 2007-01-22 00:29:40 UTC
CSE is pushing i+=1; down into the conditional which causes ifcvt not to catch the  if(i) i+=1;.
Comment 3 Paolo Bonzini 2008-08-27 04:40:03 UTC
*** Bug 37240 has been marked as a duplicate of this bug. ***
Comment 4 Paolo Bonzini 2008-08-27 04:40:31 UTC
see PR37240 for another testcase.
Comment 5 Steven Bosscher 2008-11-22 09:51:57 UTC
I created a t.c with both functions in it:

unsigned int f1(unsigned int i, unsigned int n) {++i; if (i == n) ++i; return
i;}
unsigned int f2(unsigned int i, unsigned int n) {++i; i += i == n; return i;}

With today's trunk on x86_64, I get:

	.file	"t.c"
	.text
	.align 16
.globl f1
	.type	f1, @function
f1:
.LFB0:
	leal	1(%rdi), %eax
	addl	$2, %edi
	cmpl	%esi, %eax
	cmove	%edi, %eax
	ret
.LFE0:
	.size	f1, .-f1
	.align 16
.globl f2
	.type	f2, @function
f2:
.LFB1:
	addl	$1, %edi
	xorl	%eax, %eax
	cmpl	%esi, %edi
	sete	%al
	addl	%edi, %eax
	ret
.LFE1:
	.size	f2, .-f2
	.section	.eh_frame,"aw",@progbits
.Lframe1:
	.long	.LECIE1-.LSCIE1
.LSCIE1:
	.long	0x0
	.byte	0x1
	.string	"zR"
	.byte	0x1
	.byte	0x78
	.byte	0x10
	.byte	0x1
	.byte	0x3
	.byte	0xc
	.byte	0x7
	.byte	0x8
	.byte	0x11
	.byte	0x10
	.byte	0x1
	.align 8
.LECIE1:
.LSFDE1:
	.long	.LEFDE1-.LASFDE1
.LASFDE1:
	.long	.LASFDE1-.Lframe1
	.long	.LFB0
	.long	.LFE0-.LFB0
	.byte	0x0
	.align 8
.LEFDE1:
.LSFDE3:
	.long	.LEFDE3-.LASFDE3
.LASFDE3:
	.long	.LASFDE3-.Lframe1
	.long	.LFB1
	.long	.LFE1-.LFB1
	.byte	0x0
	.align 8
.LEFDE3:
	.ident	"GCC: (GNU) 4.4.0 20081122 (experimental) [trunk revision 142117]"
	.section	.note.GNU-stack,"",@progbits


So still not the same functions.

For x86 (32bit) with -march=core2, I get similar code (f1 with a cmove). Without a -march option, I get code with a jump.
Comment 6 Steven Bosscher 2008-11-22 10:04:30 UTC
Ah, now I see what Pinski meant at comment #2.

Before CSE, we still have the original code for f1:
unsigned int f(unsigned int i, unsigned int n)
{
  i.20 = i + 1;
  if (i.20 == n) i.20 = i.20 + 1;
  return i.20;
}

After CSE (but before the first if-conversion pass) it's been transformed to:
unsigned int f(unsigned int i, unsigned int n)
{
  i.20 = i + 1;
  if (i.20 == n) i.20 = i + 2;
  return i.20;
}

The form of the code before CSE is caught in noce_try_addcc. The second form obviously not (because we don't know that the value of i.20 is "i + 1").

So this comes down to a pass ordering problem. Or, one  could argue that CSE should not perform this transformation if (say) "i.20 = i + 1" is not made dead code by the transformation to "i.20 = i + 2".
Comment 7 Paolo Bonzini 2008-11-22 10:15:42 UTC
Subject: Re:  "if (i == n) ++i;" or "i += i == n;"?


> The form of the code before CSE is caught in noce_try_addcc. The second form
> obviously not (because we don't know that the value of i.20 is "i + 1").
> 
> So this comes down to a pass ordering problem.

I'll just point out that the df merge allowed much better freedom in
pass ordering.  Ifconv1 could be anticipated before CSE and fwprop.

Paolo
Comment 8 Andrew Pinski 2012-03-04 01:44:32 UTC
We could do this at the tree level.  First do add a late PHIOPT which does the conversion to a COND_EXPR and then simplify a?b+1:b into a+b.  Likewise for a?b+2:b+1 to a+b+1.