Program taken from http://openmp.org/forum/viewtopic.php?f=3&t=1123 System: Intel Xeon X5570 @ 2.93GHz, SUSE SLES 11 (x86_64) [glibc-2.11.1] No OpenMP: gfortran -O3 -ffast-math test2.f90 time ./a.out -> 14.44user 0.00system 0:14.46elapsed 99%CPU With OpenMP and OMP_NUM_THREADS=1 gfortran -fopenmp -O3 -ffast-math test2.f90 time ./a.out -> 7.22user 0.00system 0:07.23elapsed 99%CPU Using gfortran 4.3.4, I get the 7s result also without -fopenmp; ditto with ifort 11.1. With OpenMP the run time of GCC 4.6 and ifort is exactly the same [modulo noise] for 1 and 2 threads. program calcpi USE omp_lib implicit none double precision:: h,x,sum,pi integer:: n,i double precision:: f f(x) = 4.0/(1.0+x**2) n = 2100000000 h= 1.0 / dble(n) sum = 0.0 !$OMP PARALLEL DO DEFAULT(NONE) & !$OMP SHARED(n,h) PRIVATE(x) & !$OMP REDUCTION(+:sum) DO i=1, n x = h * (dble(i)-0.5) sum = sum + f(x) END DO !$OMP END PARALLEL DO pi = h * sum write(*,*) 'Pi=',pi end program calcpi
We vectorize the reduction if the function is outlined. I suppose sth confuses the vectorizer in the non-OMP path. Yep, it's PRE, so try -fno-tree-pre: <bb 3>: # i_1 = PHI <1(2), i_22(4)> # sum_2 = PHI <0.0(2), sum_20(4)> # prephitmp.9_50 = PHI <5.66893424036281234980410020432668056299176519904892395524e-20(2), D.1586_48(4)> # ivtmp.12_10 = PHI <2100000000(2), ivtmp.12_11(4)> D.1574_17 = prephitmp.9_50 + 1.0e+0; D.1575_18 = ((D.1574_17)); D.1576_19 = 4.0e+0 / D.1575_18; sum_20 = D.1576_19 + sum_2; ivtmp.12_11 = ivtmp.12_10 - 1; if (ivtmp.12_11 == 0) goto <bb 5>; else goto <bb 4>; <bb 4>: i_22 = i_1 + 1; pretmp.8_44 = (real(kind=8)) i_22; pretmp.8_45 = pretmp.8_44 - 5.0e-1; pretmp.8_46 = ((pretmp.8_45)); pretmp.8_47 = pretmp.8_46 * 4.76190476190476200439314681013558416822206709184683859348e-10; D.1586_48 = __builtin_pow (pretmp.8_47, 2.0e+0); goto <bb 3>; is not detected as reduction. Probably not only because, but at least also because of the latch block not being empty.
This seems to have been fixed during the 4.7 revisions: I see the problem with 4.6.4, but not with 4.7.3 or higher.
Indeed.
Author: rguenth Date: Wed Apr 30 11:43:41 2014 New Revision: 209930 URL: http://gcc.gnu.org/viewcvs?rev=209930&root=gcc&view=rev Log: 2014-04-30 Richard Biener <rguenther@suse.de> PR tree-optimization/48329 * gfortran.dg/vect/pr48329.f90: New testcase. Added: trunk/gcc/testsuite/gfortran.dg/vect/pr48329.f90 Modified: trunk/gcc/testsuite/ChangeLog