This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Loop fusion.


L.S.,

Last week, a colleague of mine from Meteo France held a talk at the yearly meeting of all researchers working on HARMONIE (see http://hirlam.org) discussing the performance of our code when compiled with each of the supported compilers on the Cray XC30 at ECMWF (http://www.ecmwf.int/en/computing/our-facilities).

In the context of GCC this is relevant, because one of the three compilers is gfortran (version 4.9.2).

One of his slides discussed the differences in optimizations that the three compilers offer; I was surprised to learn that GCC/gfortran doesn't do loop fusion *at all*. Note, I discussed loop fusion (among other optimizations) at LinuxExpo 99 (http://moene.org/~toon/nwp.ps) which, unsurprisingly, was held 16 years ago :-)

Why is loop fusion important, especially in Fortran 90 and later programs ?

Because without it, every array assignment is a single loop nest, isolated from related, same-shape assignments.

Consider this (artificial, but typical) example [updating atmospheric quantities after the computation of the rate of change during a time step of the integration]:

SUBROUTINE UPDATE_DT(T, U, V, Q, DTDT, DUDT, DVDT, DQDT, &
   & NLON, NLAT, NLEV, TSTEP)
...
REAL, DIMENSION(NLON, NLAT, NLEV) :: T, U, V, Q, DTDT, DUDT, DVDT, DQDT
...
T = T + TSTEP*DTDT ! Update temperature
U = U + TSTEP*DUDT ! Update east-west wind component
V = V + TSTEP*DVDT ! Update north-south wind component
Q = Q + TSTEP*DQDT ! Update specific humidity
...
END

This generates four consecutive 3 deep loop nests over NLEV, NLAT, NLON.
Of course, it would be much more efficient if this were just one loop nest, as Fortran 77 programmers would write it:

DO JLEV = 1, NLEV
  DO JLAT = 1, NLAT
    DO JLON = 1, NLON
T(JLON, JLAT, JLEV) = T(JLON, JLAT, JLEV) + TSTEP*DTDT(JLON, JLAT, JLEV) U(JLON, JLAT, JLEV) = U(JLON, JLAT, JLEV) + TSTEP*DUDT(JLON, JLAT, JLEV) V(JLON, JLAT, JLEV) = V(JLON, JLAT, JLEV) + TSTEP*DVDT(JLON, JLAT, JLEV) Q(JLON, JLAT, JLEV) = Q(JLON, JLAT, JLEV) + TSTEP*DQDT(JLON, JLAT, JLEV)
    ENDDO
  ENDDO
ENDDO

After a loop fusion optimization pass the Fortran 90 and the Fortran 77 code should result in the same assembler output.

Is this something the Graphite infrastructure could help with ? From the wiki documentation I get the impression that it only works on single loop nests, but I must confess that I am not familiar with the nomenclature in its description ...

Would it be hard to write a loop fusion pass otherwise ?

Kind regards,

--
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]