This is the mail archive of the fortran@gcc.gnu.org mailing list for the GNU Fortran project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Polyhedron benchmark on Opteron


Dominique Dhumieres wrote:
By fixing those silly loops where the last value is set to the next to
last value inside the loop, rather than afterwards, gfortran can chop off
0.30 seconds,

You are probably speaking of:


      DO N = 0, NP1
	  BAREA(N) = AREA(RBOUND(N))
	  IF (N == NP1) BAREA(N) = BAREA(N-1)
      END DO

and

      DO N = 1, NP1
	  CAREA(N) = AREA(RADIUS(N))
	  IF (N == NP1) CAREA(N) = CAREA(N-1)
      END DO

I did not see much gain by hand optimizing these loops (within the timing noise).
I would expect a measurable improvement only with -ftree-vectorize. I did see one.

leaving the monster array assignment with vector sqrt in
eos as the one performance differentiation.

If you are speaking of


      VOL(:NP1) = DX(:NP1)/3.0*(BAREA(:NP1-1)+SQRT(BAREA(:NP1-1)*BAREA(1&
	   &    :NP1))+BAREA(1:NP1))

I did not see any improvement by replacing it by

      do n = 1, np1
         vol(n) = dx(n)*(barea(n-1)+sqrt(barea(n-1)*barea(n))+barea(n))/3.0
      end do

However replacing

          TEMP(:NODES) = IENER(:NODES)/SHEAT
          PRES(:NODES) = (CGAMMA - 1.0)*DENS(:NODES)*IENER(:NODES)
	  GAMMA(:NODES) = CGAMMA
          CS(:NODES) = SQRT(CGAMMA*PRES(:NODES)/DENS(:NODES))

by

	  GAMMA(:NODES) = CGAMMA
	  const = (CGAMMA - 1.0)*CGAMMA
	  RSHEAT = 1.0/SHEAT
	  do n = 1, nodes
	      TEMP(n) = IENER(n)*RSHEAT
	      PRES(n) = (CGAMMA - 1.0)*DENS(n)*IENER(n)
	      CS(n) = SQRT(const*IENER(n))
	  end do

gave me an important saving: from ~17" to ~13".  I did not try to split the
saving between the "loop fusion" (which should be detected by "good"
compilers) and the removal of unnecessary division (I am not sure that this
optimization could be done by (is allowed to) any compiler).


Fusion of those first 2 loops is a good point, particularly if there is not L1 cache locality. As you say, the importance of fusion has been recognized at least since the advent of f90, but I doubt whether more than 1 compiler (none that I use) would do that for x86-64. This is one of the reasons my customers still refuse to adopt much f90 syntax.
I don't believe ifort is optimizing the expression under the sqrt(), particularly since I set options which forbid that. Evidently, that's important, since gfortran apparently doesn't vectorize it. Your change should improve accuracy as well.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]