Here are the timings for capacita.f90 (unpatched, patched, and without
the call to gfc_trans_simple_do:
[ibook-dhum] lin/test% gfc -m64 -O3 -ffast-math -funroll-loops -fomit-frame-pointer capacita.f90
[ibook-dhum] lin/test% time a.out > /dev/null
49.467u 1.360s 0:51.08 99.4% 0+0k 0+0io 39pf+0w
[ibook-dhum] lin/test% time a.out > /dev/null
49.724u 1.348s 0:51.23 99.6% 0+0k 0+0io 0pf+0w
[ibook-dhum] lin/test% /Volumes/MacBook/opt/gcc/gcc4.5w-0/bin/gfortran -m64 -O3 -ffast-math -funroll-loops -fomit-frame-pointer capacita.f90
[ibook-dhum] lin/test% time a.out > /dev/null
53.174u 1.380s 0:54.74 99.6% 0+0k 0+0io 0pf+0w
[ibook-dhum] lin/test% time a.out > /dev/null
53.273u 1.377s 0:54.83 99.6% 0+0k 0+0io 0pf+0w
[ibook-dhum] lin/test% /Volumes/MacBook/opt/gcc/gcc4.5w-do/bin/gfortran -m64 -O3 -ffast-math -funroll-loops -fomit-frame-pointer capacita.f90
[ibook-dhum] lin/test% time a.out > /dev/null
48.640u 1.356s 0:50.16 99.6% 0+0k 0+0io 0pf+0w
[ibook-dhum] lin/test% time a.out > /dev/null
48.732u 1.366s 0:50.26 99.6% 0+0k 0+0io 0pf+0w
so it is ~8% slower with your patch and ~2% faster without
the call to gfc_trans_simple_do.