[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

dominiq at lps dot ens dot fr gcc-bugzilla@gcc.gnu.org
Fri Nov 20 13:45:00 GMT 2009



------- Comment #9 from dominiq at lps dot ens dot fr  2009-11-20 13:45 -------
I am rather confused by some comments:

(1) Although I am not fluent with x86 assembly, I am pretty sure that no code
in eval is vectorized (assembly taken from this pr or from the original post
http://gcc.gnu.org/ml/fortran/2009-11/msg00163.html).

(2) If I am not mistaken, the k loop always handle 3 elements for i, i+n, and
i+2*n.

(3) On a core2duo 2.1Ghz, I only see small changes in the timing between 4.3.4
to trunk, -O1 to -O3, and 32 or 64 bit mode.

Now if I do the following change:

--- pr42108_1_db.f90    2009-11-20 14:14:05.000000000 +0100
+++ pr42108_1_db_1.f90  2009-11-20 14:15:24.000000000 +0100
@@ -7,12 +7,10 @@ subroutine  eval(foo1,foo2,foo3,foo4,x,n
   do i=2,n
     foo3(i)=foo2*foo4(i)
     do  j=1,i-1
-      temp=0.0d0
-      jmini=j-i
-      do  k=i,nnd,n
-        temp=temp+(x(k)-x(k+jmini))**2
-      end do
-      temp = sqrt(temp+foo1)
+      temp = sqrt( (x(i) - x(j))**2 &
+                  +(x(i+n) - x(j+n))**2 &
+                  +(x(i+2*n)-x(j+2*n))**2 &
+                  +foo1)
       foo3(i)=foo3(i)+temp*foo4(j)
       foo3(j)=foo3(j)+temp*foo4(i)
     end do

I go from 9.2s to 5.5s for n=20000. So the k loop is not automatically unrolled
even with -funroll-loops.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108



More information about the Gcc-bugs mailing list