Both of these lines are array expressions, but they are quite simple and
gfortran manages to scalarize both of them without creating temporaries.
Both loops also vectorize nicely, which is important since gas_dyn is a
single precision program so vectorization is a real benefit on current
cpu:s (vectorization alone reduces runtime from 30s to 24s on my athlon 64).
You can find both subroutines simplified, with comments showing the
oprofile data for the CPU_CLK_UNHALTED (basically, runtime) and
L2_CACHE_MISS events for the critical lines, attached. For ifort, I had
to disable -ipo to get any results for CHOZDT (probably inlined), but
without -ipo I didn't get sensible results for EOS (seems like the line
numbers got messed up somehow for opannotate), so the results are not
entirely comparable. Nonetheless, the ifort timings change only
marginally due to -ipo, so it shouldn't make a big difference.
Ifort and other commercial compilers (I haven't tested others) still
manage to beat gfortran quite badly, see e.g.
http://www.polyhedron.com/
http://physik.fu-berlin.de/~tburnus/gcc-trunk/benchmark/
The reason, it seems, is that ifort (and presumably other commercial
compilers with competitive scores in gas_dyn) avoids calculating
divisions and square roots, replacing them with reciprocals and
reciprocal square roots. E.g. in EOS sqrt(a/b) can be calculated as
1/sqrt(b*(1/a)). This has a big impact on performance, since the SSE
instruction set contains very fast instructions for this, rcpps, rcpss,
rsqrtps, rsqrtss (PPC/Altivec also has equivalent instructions). These
instructions have latencies of 1-2 cycles vs. dozens or even hundreds of
cycles for normal division and square root. The price to be paid for
this speed is that these reciprocal instructions have an accuracy of
only 12 bits, so clearly they can be enabled only for -ffast-math. And
they are available only for single precision. I'll file a
missed-optimization PR about this.