This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug middle-end/51848] New: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop.
- From: "venkataramanan.kumar at amd dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 13 Jan 2012 13:40:05 +0000
- Subject: [Bug middle-end/51848] New: GCC is not able to vectorize when a constant value is also added to the sum of array expression inside a loop.
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51848
Bug #: 51848
Summary: GCC is not able to vectorize when a constant value is
also added to the sum of array expression inside a
loop.
Classification: Unclassified
Product: gcc
Version: 4.7.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: venkataramanan.kumar@amd.com
This below test case is simulated from "air.f90" benchmark of polyhedren.
What I see is vectorization makes "air" run faster with ICC than GCC by about
16%,
but I am not sure if all that comes from vectorization alone.
While analysing the assembly differences, found that GCC is not vectorizing the
below case wheres ICC does vectorize.
(Snip)
DIMENSION NPX(30) , NPY(30)
COMMON /XD1 / MXPy, NDX
COMMON /XD2 / MXPx
MXPx = 0
DO i = 1 , NDX
MXPx = MXPx + NPX(i)+1
ENDDO
!
END
(Snip)
Machine: x86_64-unknown-linux-gnu
GCC revison: 183151
ICC revision: 12.1.0.233 Build 2
gcc -Ofast -march=corei7-avx -limf -lsvml -L /tool/intel/lib/intel64/
-mveclibabi=svml pattern1.f90 -ftree-vectorizer-verbose=2 -S
Analyzing loop at pattern1.f90:5
5: not vectorized: unsupported use in stmt.
5: not vectorized: unsupported use in stmt.
pattern1.f90:9: note: vectorized 0 loops in function.
ifort -march=corei7-avx -O3 -limf -lsvml -L /tool/intel/lib/intel64/
pattern1.f90 -vec-report -S -fsource-asm
pattern1.f90(5): (col. 7) remark: LOOP WAS VECTORIZED.
For the expression:
MXPx = MXPx + NPX(i)+1
The constant "1" is converted to a vector packet as shown below
.L_2il0floatpacket.0:
.long 0x00000001,0x00000001,0x00000001,0x00000001
The assembly pattern for the vectorization portion in ICC looks like as shown
below:
The total expression now becomes vectorizable.
vmovdqu .L_2il0floatpacket.0(%rip), %xmm0
..B1.5: # Preds ..B1.5 ..B1.4
vpaddd _unnamed_main$_$NPX.0.1(,%rax,4), %xmm0, %xmm2 #6.10
addq $4, %rax #5.7
vpaddd %xmm2, %xmm1, %xmm1 #6.22
cmpq %rdx, %rax #5.7
jb ..B1.5 # Prob 96% #5.7
Please provide your thoughts on this and possible vectorization improvement in
GCC for this pattern.