Bug 49869

Summary: Excessive loop versioning done by vectorization + predictive commoning
Product: gcc Reporter: Richard Biener <rguenth>
Component: tree-optimizationAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED WORKSFORME    
Severity: normal Keywords: missed-optimization
Priority: P3    
Version: 4.7.0   
Target Milestone: ---   
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed:
Bug Depends on:    
Bug Blocks: 53947    

Description Richard Biener 2011-07-27 14:20:09 UTC
SUBROUTINE ANYAVG(NLVLS,HTS,PARRAY,ZBOT,NDXABV,ZTOP,NDXBLW,VALAVG)
      IMPLICIT NONE
      INTEGER I , NLVLS , NDXABV , NDXBLW
      REAL HTS(NLVLS) , PARRAY(NLVLS) , ZBOT , ZTOP , SUM , VALAVG
      REAL VALBOT , VALTOP
      IF ( ZBOT.LT.0.5 ) THEN
         ZBOT = 0.5
         NDXABV = 2
      ENDIF
      IF ( ZTOP.LT.0.51 ) THEN
         ZTOP = 0.51
         NDXBLW = 2
      ENDIF
      IF ( NDXBLW.LE.NDXABV ) GOTO 200
      DO I = NDXABV + 1 , NDXBLW
         SUM = SUM + (HTS(I)-HTS(I-1))*0.5*(PARRAY(I)+PARRAY(I-1))
      ENDDO
 200  CONTINUE
      VALAVG = SUM/(ZTOP-ZBOT)
      END

ends up with 5 loop copies.
Comment 1 Andrew Pinski 2021-08-05 23:36:51 UTC
On the trunk I only see one copy of the loop:
.L11:
        movups  (%rbx,%rax), %xmm7
        movups  0(%rbp,%rax), %xmm0
        movups  (%r9,%rax), %xmm1
        subps   %xmm7, %xmm0
        movups  (%r11,%rax), %xmm7
        addq    $16, %rax
        addps   %xmm7, %xmm1
        mulps   %xmm6, %xmm0
        mulps   %xmm1, %xmm0
        movaps  %xmm0, %xmm1
        addss   %xmm2, %xmm1
        movaps  %xmm0, %xmm2
        shufps  $85, %xmm0, %xmm2
        addss   %xmm2, %xmm1
        movaps  %xmm0, %xmm2
        unpckhps        %xmm0, %xmm2
        shufps  $255, %xmm0, %xmm0
        addss   %xmm2, %xmm1
        movaps  %xmm1, %xmm2
        addss   %xmm0, %xmm2
        cmpq    %rax, %r10
        jne     .L11
Comment 2 Richard Biener 2021-08-06 07:33:11 UTC
pcom and vect have been exchanged, now we no longer perform predictive commoning.  We're also better in identifying single-iteration loops.  Let's close the issue.