This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug middle-end/36099] [4.4 Regression] early loop unrolling pass prevents vectorization, SLP doesn't do its job
- From: "dominiq at lps dot ens dot fr" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 13 May 2008 15:27:49 -0000
- Subject: [Bug middle-end/36099] [4.4 Regression] early loop unrolling pass prevents vectorization, SLP doesn't do its job
- References: <bug-36099-12313@http.gcc.gnu.org/bugzilla/>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Comment #5 from dominiq at lps dot ens dot fr 2008-05-13 15:27 -------
I just noticed today that the vectorization of the variant induct.v2.f90
depends on the -m64 flag:
[ibook-dhum] source/dir_indu% gfc -m64 -O3 -ffast-math -funroll-loops
-ftree-vectorizer-verbose=2 indu.v2.f90
...
indu.v2.f90:2322: note: not vectorized: unsupported use in stmt.
indu.v2.f90:2245: note: not vectorized: unsupported unaligned store.
indu.v2.f90:2244: note: vectorizing stmts using SLP.
indu.v2.f90:2244: note: LOOP VECTORIZED.
indu.v2.f90:2146: note: not vectorized: unsupported use in stmt.
indu.v2.f90:2069: note: not vectorized: unsupported unaligned store.
indu.v2.f90:2068: note: vectorizing stmts using SLP.
indu.v2.f90:2068: note: LOOP VECTORIZED.
indu.v2.f90:1976: note: not vectorized: complicated access pattern.
indu.v2.f90:1875: note: vectorized 2 loops in function.
indu.v2.f90:1816: note: not vectorized: unsupported use in stmt.
indu.v2.f90:1771: note: not vectorized: unsupported unaligned store.
indu.v2.f90:1770: note: vectorizing stmts using SLP.
indu.v2.f90:1770: note: LOOP VECTORIZED.
indu.v2.f90:1682: note: not vectorized: unsupported use in stmt.
indu.v2.f90:1633: note: not vectorized: unsupported unaligned store.
indu.v2.f90:1632: note: vectorizing stmts using SLP.
indu.v2.f90:1632: note: LOOP VECTORIZED.
indu.v2.f90:1543: note: not vectorized: complicated access pattern.
indu.v2.f90:1441: note: vectorized 2 loops in function.
...
[ibook-dhum] source/dir_indu% gfc -O3 -ffast-math -funroll-loops
-ftree-vectorizer-verbose=2 indu.v2.f90
...
indu.v2.f90:2334: note: LOOP VECTORIZED.
indu.v2.f90:2245: note: not vectorized: unsupported unaligned store.
indu.v2.f90:2244: note: vectorizing stmts using SLP.
indu.v2.f90:2244: note: LOOP VECTORIZED.
indu.v2.f90:2158: note: LOOP VECTORIZED.
indu.v2.f90:2069: note: not vectorized: unsupported unaligned store.
indu.v2.f90:2068: note: vectorizing stmts using SLP.
indu.v2.f90:2068: note: LOOP VECTORIZED.
indu.v2.f90:1976: note: not vectorized: complicated access pattern.
indu.v2.f90:1875: note: vectorized 4 loops in function.
indu.v2.f90:1825: note: LOOP VECTORIZED.
indu.v2.f90:1771: note: not vectorized: unsupported unaligned store.
indu.v2.f90:1770: note: vectorizing stmts using SLP.
indu.v2.f90:1770: note: LOOP VECTORIZED.
indu.v2.f90:1691: note: LOOP VECTORIZED.
indu.v2.f90:1633: note: not vectorized: unsupported unaligned store.
indu.v2.f90:1632: note: vectorizing stmts using SLP.
indu.v2.f90:1632: note: LOOP VECTORIZED.
indu.v2.f90:1543: note: not vectorized: complicated access pattern.
indu.v2.f90:1441: note: vectorized 4 loops in function.
...
Where the nested loop vectorized without -m64 at 1691 is:
...
do j = 1, 9
c_vector(3) = 0.5_longreal * h_coil * z1gauss(j)
!
! rotate coil vector into the global coordinate system and translate it
!
rot_c_vector(1) = rot_i_vector(1) + rotate_coil(1,3) *
c_vector(3)
rot_c_vector(2) = rot_i_vector(2) + rotate_coil(2,3) *
c_vector(3)
rot_c_vector(3) = rot_i_vector(3) + rotate_coil(3,3) *
c_vector(3)
!
do k = 1, 9 ! <==== line 1691
!
! rotate quad vector into the global coordinate system
!
rot_q_vector(1) = rot_q1_vector(k,1) - rot_c_vector(1)
rot_q_vector(2) = rot_q1_vector(k,2) - rot_c_vector(2)
rot_q_vector(3) = rot_q1_vector(k,3) - rot_c_vector(3)
!
! compute and add in quadrature term
!
numerator = dotp * w1gauss(j) * w2gauss(k)
dotp2=rot_q_vector(1)*rot_q_vector(1)+rot_q_vector(2)*rot_q_vector(2)+ &
rot_q_vector(3)*rot_q_vector(3)
denominator = sqrt(dotp2)
l12_lower = l12_lower + numerator/denominator
end do
end do
...
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36099