Bug 44134 - Unneeded +0.0 for c = 0.0 ; c = c+ a*b
Summary: Unneeded +0.0 for c = 0.0 ; c = c+ a*b
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.6.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2010-05-14 09:18 UTC by Thomas Koenig
Modified: 2010-05-14 12:19 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Koenig 2010-05-14 09:18:59 UTC
This code leads to the adding of 0.0, which is a nop.  Any
signalling should have been done previously.

ig25@linux-fd1f:/tmp> cat mult.f90
subroutine foo(a,b,c)
  real, intent(in) :: a,b
  real, intent(out) :: c
  c = 0.0
  c = c + a*b
end subroutine foo
ig25@linux-fd1f:/tmp> gfortran -O3 -fdump-tree-optimized -S mult.f90
ig25@linux-fd1f:/tmp> cat mult.f90.142t.optimized

;; Function foo (foo_)

foo (real(kind=4) & restrict a, real(kind=4) & restrict b, real(kind=4) & restrict c)
{
  real(kind=4) D.1542;
  real(kind=4) D.1541;
  real(kind=4) D.1540;
  real(kind=4) D.1539;

<bb 2>:
  D.1539_4 = *a_3(D);
  D.1540_6 = *b_5(D);
  D.1541_7 = D.1539_4 * D.1540_6;
  D.1542_8 = D.1541_7 + 0.0;
  *c_1(D) = D.1542_8;
  return;

}

ig25@linux-fd1f:/tmp> cat mult.s 
        .file   "mult.f90"       
        .text                    
        .p2align 4,,15           
.globl foo_                      
        .type   foo_, @function  
foo_:                            
.LFB0:                           
        movss   (%rdi), %xmm0    
        mulss   (%rsi), %xmm0    
        addss   .LC0(%rip), %xmm0
        movss   %xmm0, (%rdx)    
        ret                      
.LFE0:                           
        .size   foo_, .-foo_     
        .section        .rodata.cst4,"aM",@progbits,4
        .align 4                                     
.LC0:
        .long   0
        .section        .eh_frame,"a",@progbits
.Lframe1:
        .long   .LECIE1-.LSCIE1
.LSCIE1:
        .long   0
        .byte   0x1
        .string "zR"
        .uleb128 0x1
        .sleb128 -8
        .byte   0x10
        .uleb128 0x1
        .byte   0x3
        .byte   0xc
        .uleb128 0x7
        .uleb128 0x8
        .byte   0x90
        .uleb128 0x1
        .align 8
.LECIE1:
.LSFDE1:
        .long   .LEFDE1-.LASFDE1
.LASFDE1:
        .long   .LASFDE1-.Lframe1
        .long   .LFB0
        .long   .LFE0-.LFB0
        .uleb128 0
        .align 8
.LEFDE1:
        .ident  "GCC: (GNU) 4.6.0 20100513 (experimental)"
        .section        .note.GNU-stack,"",@progbits
Comment 1 Richard Biener 2010-05-14 12:19:48 UTC
But -0.0 + 0.0 is 0.0, so the transformation is only valid for -fno-signed-zeros.
Comment 2 pinskia@gmail.com 2010-05-14 13:10:34 UTC
Subject: Re:   New: Unneeded +0.0 for c = 0.0 ; c = c+ a*b



Sent from my iPhone

On May 14, 2010, at 2:18 AM, "tkoenig at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org 
 > wrote:

> This code leads to the adding of 0.0, which is a nop.  Any
> signalling should have been done previously.

It is not signalling that matters here but signed zero. 0.0 + -0.0 ==  
0.0. So without the 0.0 +, you can get a negative zero.


>
> ig25@linux-fd1f:/tmp> cat mult.f90
> subroutine foo(a,b,c)
>  real, intent(in) :: a,b
>  real, intent(out) :: c
>  c = 0.0
>  c = c + a*b
> end subroutine foo
> ig25@linux-fd1f:/tmp> gfortran -O3 -fdump-tree-optimized -S mult.f90
> ig25@linux-fd1f:/tmp> cat mult.f90.142t.optimized
>
> ;; Function foo (foo_)
>
> foo (real(kind=4) & restrict a, real(kind=4) & restrict b, real 
> (kind=4) &
> restrict c)
> {
>  real(kind=4) D.1542;
>  real(kind=4) D.1541;
>  real(kind=4) D.1540;
>  real(kind=4) D.1539;
>
> <bb 2>:
>  D.1539_4 = *a_3(D);
>  D.1540_6 = *b_5(D);
>  D.1541_7 = D.1539_4 * D.1540_6;
>  D.1542_8 = D.1541_7 + 0.0;
>  *c_1(D) = D.1542_8;
>  return;
>
> }
>
> ig25@linux-fd1f:/tmp> cat mult.s
>        .file   "mult.f90"
>        .text
>        .p2align 4,,15
> .globl foo_
>        .type   foo_, @function
> foo_:
> .LFB0:
>        movss   (%rdi), %xmm0
>        mulss   (%rsi), %xmm0
>        addss   .LC0(%rip), %xmm0
>        movss   %xmm0, (%rdx)
>        ret
> .LFE0:
>        .size   foo_, .-foo_
>        .section        .rodata.cst4,"aM",@progbits,4
>        .align 4
> .LC0:
>        .long   0
>        .section        .eh_frame,"a",@progbits
> .Lframe1:
>        .long   .LECIE1-.LSCIE1
> .LSCIE1:
>        .long   0
>        .byte   0x1
>        .string "zR"
>        .uleb128 0x1
>        .sleb128 -8
>        .byte   0x10
>        .uleb128 0x1
>        .byte   0x3
>        .byte   0xc
>        .uleb128 0x7
>        .uleb128 0x8
>        .byte   0x90
>        .uleb128 0x1
>        .align 8
> .LECIE1:
> .LSFDE1:
>        .long   .LEFDE1-.LASFDE1
> .LASFDE1:
>        .long   .LASFDE1-.Lframe1
>        .long   .LFB0
>        .long   .LFE0-.LFB0
>        .uleb128 0
>        .align 8
> .LEFDE1:
>        .ident  "GCC: (GNU) 4.6.0 20100513 (experimental)"
>        .section        .note.GNU-stack,"",@progbits
>
>
> -- 
>           Summary: Unneeded +0.0 for c = 0.0 ; c = c+ a*b
>           Product: gcc
>           Version: 4.6.0
>            Status: UNCONFIRMED
>          Keywords: missed-optimization
>          Severity: enhancement
>          Priority: P3
>         Component: middle-end
>        AssignedTo: unassigned at gcc dot gnu dot org
>        ReportedBy: tkoenig at gcc dot gnu dot org
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44134
>