Suppose that we have a function f that can be written in 2 ways with identical result: unsigned int f(unsigned int i, unsigned int n) {++i; if (i == n) ++i; return i;} unsigned int f(unsigned int i, unsigned int n) {++i; i += i == n; return i;} g++ -O3 produces different code for the 2 versions: pushl %ebp .LCFI0: + xorl %edx, %edx movl %esp, %ebp .LCFI1: - movl 8(%ebp), %edx - leal 1(%edx), %eax + movl 8(%ebp), %eax + incl %eax cmpl 12(%ebp), %eax - je .L6 popl %ebp - ret - .p2align 4,,7 -.L6: - popl %ebp - leal 2(%edx), %eax + sete %dl + addl %edx, %eax ret .LFE2: This implies that one of the following 3 statements holds: 1. The 2 versions of f are indeed not identical. 2. The 2 versions of the generated code are equally efficient, so the difference does not matter. 3. g++ generates suboptimal code for one of the versions of f. An answer at [http://gcc.gnu.org/ml/gcc-help/2007-01/msg00253.html] suggests that statement 3 holds.
I think this has been fixed already, for PPC with 4.0.2, we get: _f: mr r2,r3 addi r3,r3,1 cmpw cr7,r3,r4 bnelr cr7 addi r3,r2,2 blr .align 2 .p2align 4,,15 .globl _f1 _f1: addi r3,r3,1 xor r0,r3,r4 subfic r0,r0,0 addze r0,r3 mr r3,r0 blr While on the trunk we get: _f: addi r3,r3,1 xor r0,r3,r4 subfic r0,r0,0 addze r0,r3 mr r3,r0 blr .align 2 .globl _f1 _f1: addi r3,r3,1 xor r0,r3,r4 subfic r0,r0,0 addze r0,r3 mr r3,r0 blr
CSE is pushing i+=1; down into the conditional which causes ifcvt not to catch the if(i) i+=1;.
*** Bug 37240 has been marked as a duplicate of this bug. ***
see PR37240 for another testcase.
I created a t.c with both functions in it: unsigned int f1(unsigned int i, unsigned int n) {++i; if (i == n) ++i; return i;} unsigned int f2(unsigned int i, unsigned int n) {++i; i += i == n; return i;} With today's trunk on x86_64, I get: .file "t.c" .text .align 16 .globl f1 .type f1, @function f1: .LFB0: leal 1(%rdi), %eax addl $2, %edi cmpl %esi, %eax cmove %edi, %eax ret .LFE0: .size f1, .-f1 .align 16 .globl f2 .type f2, @function f2: .LFB1: addl $1, %edi xorl %eax, %eax cmpl %esi, %edi sete %al addl %edi, %eax ret .LFE1: .size f2, .-f2 .section .eh_frame,"aw",@progbits .Lframe1: .long .LECIE1-.LSCIE1 .LSCIE1: .long 0x0 .byte 0x1 .string "zR" .byte 0x1 .byte 0x78 .byte 0x10 .byte 0x1 .byte 0x3 .byte 0xc .byte 0x7 .byte 0x8 .byte 0x11 .byte 0x10 .byte 0x1 .align 8 .LECIE1: .LSFDE1: .long .LEFDE1-.LASFDE1 .LASFDE1: .long .LASFDE1-.Lframe1 .long .LFB0 .long .LFE0-.LFB0 .byte 0x0 .align 8 .LEFDE1: .LSFDE3: .long .LEFDE3-.LASFDE3 .LASFDE3: .long .LASFDE3-.Lframe1 .long .LFB1 .long .LFE1-.LFB1 .byte 0x0 .align 8 .LEFDE3: .ident "GCC: (GNU) 4.4.0 20081122 (experimental) [trunk revision 142117]" .section .note.GNU-stack,"",@progbits So still not the same functions. For x86 (32bit) with -march=core2, I get similar code (f1 with a cmove). Without a -march option, I get code with a jump.
Ah, now I see what Pinski meant at comment #2. Before CSE, we still have the original code for f1: unsigned int f(unsigned int i, unsigned int n) { i.20 = i + 1; if (i.20 == n) i.20 = i.20 + 1; return i.20; } After CSE (but before the first if-conversion pass) it's been transformed to: unsigned int f(unsigned int i, unsigned int n) { i.20 = i + 1; if (i.20 == n) i.20 = i + 2; return i.20; } The form of the code before CSE is caught in noce_try_addcc. The second form obviously not (because we don't know that the value of i.20 is "i + 1"). So this comes down to a pass ordering problem. Or, one could argue that CSE should not perform this transformation if (say) "i.20 = i + 1" is not made dead code by the transformation to "i.20 = i + 2".
Subject: Re: "if (i == n) ++i;" or "i += i == n;"? > The form of the code before CSE is caught in noce_try_addcc. The second form > obviously not (because we don't know that the value of i.20 is "i + 1"). > > So this comes down to a pass ordering problem. I'll just point out that the df merge allowed much better freedom in pass ordering. Ifconv1 could be anticipated before CSE and fwprop. Paolo
We could do this at the tree level. First do add a late PHIOPT which does the conversion to a COND_EXPR and then simplify a?b+1:b into a+b. Likewise for a?b+2:b+1 to a+b+1.