[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
dominiq at lps dot ens dot fr
gcc-bugzilla@gcc.gnu.org
Fri Nov 20 13:45:00 GMT 2009
------- Comment #9 from dominiq at lps dot ens dot fr 2009-11-20 13:45 -------
I am rather confused by some comments:
(1) Although I am not fluent with x86 assembly, I am pretty sure that no code
in eval is vectorized (assembly taken from this pr or from the original post
http://gcc.gnu.org/ml/fortran/2009-11/msg00163.html).
(2) If I am not mistaken, the k loop always handle 3 elements for i, i+n, and
i+2*n.
(3) On a core2duo 2.1Ghz, I only see small changes in the timing between 4.3.4
to trunk, -O1 to -O3, and 32 or 64 bit mode.
Now if I do the following change:
--- pr42108_1_db.f90 2009-11-20 14:14:05.000000000 +0100
+++ pr42108_1_db_1.f90 2009-11-20 14:15:24.000000000 +0100
@@ -7,12 +7,10 @@ subroutine eval(foo1,foo2,foo3,foo4,x,n
do i=2,n
foo3(i)=foo2*foo4(i)
do j=1,i-1
- temp=0.0d0
- jmini=j-i
- do k=i,nnd,n
- temp=temp+(x(k)-x(k+jmini))**2
- end do
- temp = sqrt(temp+foo1)
+ temp = sqrt( (x(i) - x(j))**2 &
+ +(x(i+n) - x(j+n))**2 &
+ +(x(i+2*n)-x(j+2*n))**2 &
+ +foo1)
foo3(i)=foo3(i)+temp*foo4(j)
foo3(j)=foo3(j)+temp*foo4(i)
end do
I go from 9.2s to 5.5s for n=20000. So the k loop is not automatically unrolled
even with -funroll-loops.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
More information about the Gcc-bugs
mailing list