Bug 48329 - Missed vectorization of reduction due to PRE
Summary: Missed vectorization of reduction due to PRE
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.6.0
: P3 normal
Target Milestone: 4.7.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2011-03-29 08:49 UTC by Tobias Burnus
Modified: 2014-04-30 11:44 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2011-03-29 10:31:56


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tobias Burnus 2011-03-29 08:49:47 UTC
Program taken from http://openmp.org/forum/viewtopic.php?f=3&t=1123

System: Intel Xeon X5570  @ 2.93GHz, SUSE SLES 11 (x86_64) [glibc-2.11.1]

No OpenMP:
  gfortran -O3 -ffast-math test2.f90
  time ./a.out ->  14.44user 0.00system 0:14.46elapsed 99%CPU

With OpenMP and OMP_NUM_THREADS=1
  gfortran -fopenmp -O3 -ffast-math test2.f90
  time ./a.out ->  7.22user 0.00system 0:07.23elapsed 99%CPU

Using gfortran 4.3.4, I get the 7s result also without -fopenmp; ditto with ifort 11.1. With OpenMP the run time of GCC 4.6 and ifort is exactly the same [modulo noise] for 1 and 2 threads.



program calcpi
USE omp_lib
    implicit none
    double precision:: h,x,sum,pi
    integer:: n,i
    double precision:: f

   f(x) = 4.0/(1.0+x**2)

   n = 2100000000

   h= 1.0 / dble(n)
   sum = 0.0
!$OMP PARALLEL DO DEFAULT(NONE) &
!$OMP SHARED(n,h) PRIVATE(x) &
!$OMP REDUCTION(+:sum)
  DO i=1, n
     x = h * (dble(i)-0.5)
     sum = sum + f(x)
  END DO
!$OMP END PARALLEL DO
  pi = h * sum
  write(*,*) 'Pi=',pi

end program calcpi
Comment 1 Richard Biener 2011-03-29 10:31:56 UTC
We vectorize the reduction if the function is outlined.  I suppose sth
confuses the vectorizer in the non-OMP path.  Yep, it's PRE, so try
-fno-tree-pre:

<bb 3>:
  # i_1 = PHI <1(2), i_22(4)>
  # sum_2 = PHI <0.0(2), sum_20(4)>
  # prephitmp.9_50 = PHI <5.66893424036281234980410020432668056299176519904892395524e-20(2), D.1586_48(4)>
  # ivtmp.12_10 = PHI <2100000000(2), ivtmp.12_11(4)>
  D.1574_17 = prephitmp.9_50 + 1.0e+0;
  D.1575_18 = ((D.1574_17));
  D.1576_19 = 4.0e+0 / D.1575_18;
  sum_20 = D.1576_19 + sum_2;
  ivtmp.12_11 = ivtmp.12_10 - 1;
  if (ivtmp.12_11 == 0)
    goto <bb 5>;
  else
    goto <bb 4>;

<bb 4>:
  i_22 = i_1 + 1;
  pretmp.8_44 = (real(kind=8)) i_22;
  pretmp.8_45 = pretmp.8_44 - 5.0e-1;
  pretmp.8_46 = ((pretmp.8_45));
  pretmp.8_47 = pretmp.8_46 * 4.76190476190476200439314681013558416822206709184683859348e-10;
  D.1586_48 = __builtin_pow (pretmp.8_47, 2.0e+0);
  goto <bb 3>;

is not detected as reduction.  Probably not only because, but at least
also because of the latch block not being empty.
Comment 2 Dominique d'Humieres 2014-04-30 09:34:43 UTC
This seems to have been fixed during the 4.7 revisions: I see the problem with 4.6.4, but not with 4.7.3 or higher.
Comment 3 Richard Biener 2014-04-30 11:35:31 UTC
Indeed.
Comment 4 Richard Biener 2014-04-30 11:44:13 UTC
Author: rguenth
Date: Wed Apr 30 11:43:41 2014
New Revision: 209930

URL: http://gcc.gnu.org/viewcvs?rev=209930&root=gcc&view=rev
Log:
2014-04-30  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/48329
	* gfortran.dg/vect/pr48329.f90: New testcase.

Added:
    trunk/gcc/testsuite/gfortran.dg/vect/pr48329.f90
Modified:
    trunk/gcc/testsuite/ChangeLog