This code leads to the adding of 0.0, which is a nop. Any signalling should have been done previously. ig25@linux-fd1f:/tmp> cat mult.f90 subroutine foo(a,b,c) real, intent(in) :: a,b real, intent(out) :: c c = 0.0 c = c + a*b end subroutine foo ig25@linux-fd1f:/tmp> gfortran -O3 -fdump-tree-optimized -S mult.f90 ig25@linux-fd1f:/tmp> cat mult.f90.142t.optimized ;; Function foo (foo_) foo (real(kind=4) & restrict a, real(kind=4) & restrict b, real(kind=4) & restrict c) { real(kind=4) D.1542; real(kind=4) D.1541; real(kind=4) D.1540; real(kind=4) D.1539; <bb 2>: D.1539_4 = *a_3(D); D.1540_6 = *b_5(D); D.1541_7 = D.1539_4 * D.1540_6; D.1542_8 = D.1541_7 + 0.0; *c_1(D) = D.1542_8; return; } ig25@linux-fd1f:/tmp> cat mult.s .file "mult.f90" .text .p2align 4,,15 .globl foo_ .type foo_, @function foo_: .LFB0: movss (%rdi), %xmm0 mulss (%rsi), %xmm0 addss .LC0(%rip), %xmm0 movss %xmm0, (%rdx) ret .LFE0: .size foo_, .-foo_ .section .rodata.cst4,"aM",@progbits,4 .align 4 .LC0: .long 0 .section .eh_frame,"a",@progbits .Lframe1: .long .LECIE1-.LSCIE1 .LSCIE1: .long 0 .byte 0x1 .string "zR" .uleb128 0x1 .sleb128 -8 .byte 0x10 .uleb128 0x1 .byte 0x3 .byte 0xc .uleb128 0x7 .uleb128 0x8 .byte 0x90 .uleb128 0x1 .align 8 .LECIE1: .LSFDE1: .long .LEFDE1-.LASFDE1 .LASFDE1: .long .LASFDE1-.Lframe1 .long .LFB0 .long .LFE0-.LFB0 .uleb128 0 .align 8 .LEFDE1: .ident "GCC: (GNU) 4.6.0 20100513 (experimental)" .section .note.GNU-stack,"",@progbits
But -0.0 + 0.0 is 0.0, so the transformation is only valid for -fno-signed-zeros.
Subject: Re: New: Unneeded +0.0 for c = 0.0 ; c = c+ a*b Sent from my iPhone On May 14, 2010, at 2:18 AM, "tkoenig at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org > wrote: > This code leads to the adding of 0.0, which is a nop. Any > signalling should have been done previously. It is not signalling that matters here but signed zero. 0.0 + -0.0 == 0.0. So without the 0.0 +, you can get a negative zero. > > ig25@linux-fd1f:/tmp> cat mult.f90 > subroutine foo(a,b,c) > real, intent(in) :: a,b > real, intent(out) :: c > c = 0.0 > c = c + a*b > end subroutine foo > ig25@linux-fd1f:/tmp> gfortran -O3 -fdump-tree-optimized -S mult.f90 > ig25@linux-fd1f:/tmp> cat mult.f90.142t.optimized > > ;; Function foo (foo_) > > foo (real(kind=4) & restrict a, real(kind=4) & restrict b, real > (kind=4) & > restrict c) > { > real(kind=4) D.1542; > real(kind=4) D.1541; > real(kind=4) D.1540; > real(kind=4) D.1539; > > <bb 2>: > D.1539_4 = *a_3(D); > D.1540_6 = *b_5(D); > D.1541_7 = D.1539_4 * D.1540_6; > D.1542_8 = D.1541_7 + 0.0; > *c_1(D) = D.1542_8; > return; > > } > > ig25@linux-fd1f:/tmp> cat mult.s > .file "mult.f90" > .text > .p2align 4,,15 > .globl foo_ > .type foo_, @function > foo_: > .LFB0: > movss (%rdi), %xmm0 > mulss (%rsi), %xmm0 > addss .LC0(%rip), %xmm0 > movss %xmm0, (%rdx) > ret > .LFE0: > .size foo_, .-foo_ > .section .rodata.cst4,"aM",@progbits,4 > .align 4 > .LC0: > .long 0 > .section .eh_frame,"a",@progbits > .Lframe1: > .long .LECIE1-.LSCIE1 > .LSCIE1: > .long 0 > .byte 0x1 > .string "zR" > .uleb128 0x1 > .sleb128 -8 > .byte 0x10 > .uleb128 0x1 > .byte 0x3 > .byte 0xc > .uleb128 0x7 > .uleb128 0x8 > .byte 0x90 > .uleb128 0x1 > .align 8 > .LECIE1: > .LSFDE1: > .long .LEFDE1-.LASFDE1 > .LASFDE1: > .long .LASFDE1-.Lframe1 > .long .LFB0 > .long .LFE0-.LFB0 > .uleb128 0 > .align 8 > .LEFDE1: > .ident "GCC: (GNU) 4.6.0 20100513 (experimental)" > .section .note.GNU-stack,"",@progbits > > > -- > Summary: Unneeded +0.0 for c = 0.0 ; c = c+ a*b > Product: gcc > Version: 4.6.0 > Status: UNCONFIRMED > Keywords: missed-optimization > Severity: enhancement > Priority: P3 > Component: middle-end > AssignedTo: unassigned at gcc dot gnu dot org > ReportedBy: tkoenig at gcc dot gnu dot org > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44134 >