This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug middle-end/51848] New: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop.


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51848

             Bug #: 51848
           Summary: GCC is not able to vectorize when a constant value is
                    also added to the sum of array expression inside a
                    loop.
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: venkataramanan.kumar@amd.com


This below test case is simulated from "air.f90" benchmark of polyhedren. 

What I see is vectorization makes "air" run faster with ICC than GCC by about
16%,
but I am not sure if all that comes from vectorization alone.

While analysing the assembly differences, found that GCC is not vectorizing the
below case wheres ICC does vectorize.

(Snip)
      DIMENSION NPX(30) , NPY(30)
      COMMON /XD1   / MXPy, NDX
      COMMON /XD2  / MXPx
      MXPx = 0
      DO i = 1 , NDX
         MXPx = MXPx + NPX(i)+1
      ENDDO
!
      END
(Snip)


Machine: x86_64-unknown-linux-gnu
GCC revison: 183151 
ICC revision: 12.1.0.233 Build 2

gcc -Ofast -march=corei7-avx  -limf -lsvml -L /tool/intel/lib/intel64/
-mveclibabi=svml   pattern1.f90 -ftree-vectorizer-verbose=2 -S

Analyzing loop at pattern1.f90:5

5: not vectorized: unsupported use in stmt.
5: not vectorized: unsupported use in stmt.
pattern1.f90:9: note: vectorized 0 loops in function.


ifort -march=corei7-avx  -O3  -limf -lsvml -L /tool/intel/lib/intel64/ 
pattern1.f90  -vec-report -S -fsource-asm

pattern1.f90(5): (col. 7) remark: LOOP WAS VECTORIZED.


For the expression: 

MXPx = MXPx + NPX(i)+1


The constant "1" is converted to a vector packet as shown below

 .L_2il0floatpacket.0:
        .long   0x00000001,0x00000001,0x00000001,0x00000001

The assembly pattern for the vectorization portion in ICC looks like as shown
below:


The total expression now becomes vectorizable. 

vmovdqu   .L_2il0floatpacket.0(%rip), %xmm0

..B1.5:                         # Preds ..B1.5 ..B1.4
        vpaddd    _unnamed_main$_$NPX.0.1(,%rax,4), %xmm0, %xmm2 #6.10
        addq      $4, %rax                                      #5.7
        vpaddd    %xmm2, %xmm1, %xmm1                           #6.22
        cmpq      %rdx, %rax                                    #5.7
        jb        ..B1.5        # Prob 96%                      #5.7

Please provide your thoughts on this and possible vectorization improvement in
GCC for this pattern.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]