This is the mail archive of the
fortran@gcc.gnu.org
mailing list for the GNU Fortran project.
gfortran 4.3.2 about 3x slower than g77 for scientific application
- From: VelocideX <jking dot phys at gmail dot com>
- To: fortran at gcc dot gnu dot org
- Date: Wed, 13 May 2009 11:50:13 -0700 (PDT)
- Subject: gfortran 4.3.2 about 3x slower than g77 for scientific application
Hi all,
I am running a custom scientific fortran 77 program for scientific purposes
(i.e. it is computationally intensive).
I am on a Intel Core 2 Duo P8700 processor with 4 GB RAM, running OpenSuSE
11.1 64 bit (i.e x86-64 architecture).
Executing the program on identical data is about 3x slower in gfortran than
g77 (v3.3.3-hammer). This seems a serious deficiency given that all the run
time in the program is spent in relatively simple calls. Both versions were
compiled with the flag -O2 only.
Here are the gprof results for the g77:
% cumulative self self total
time seconds seconds calls s/call s/call name
64.12 2.52 2.52 23679 0.00 0.00 vp_spvoigte__
19.34 3.28 0.76 28574044 0.00 0.00 voigt_
6.36 3.53 0.25 12 0.02 0.28 deriv_
4.33 3.70 0.17 32747695 0.00 0.00 dexpf_
3.05 3.82 0.12 877 0.00 0.00 vp_chspread__
1.27 3.87 0.05 1 0.05 0.05 pr_sort__
0.51 3.89 0.02 1 0.02 0.02 probks_
0.25 3.90 0.01 23679 0.00 0.00 calcn_
0.25 3.91 0.01 877 0.00 0.00 vp_chipconv__
0.25 3.92 0.01 11 0.00 0.00 udchole_
0.25 3.93 0.01 1 0.01 0.01 pldef_
0.00 3.93 0.00 550174 0.00 0.00 ucase_
...and for the gfortran (slower version) execution:
% cumulative self self total
time seconds seconds calls s/call s/call name
43.27 3.28 3.28 33885 0.00 0.00 vp_spvoigte_
27.04 5.33 2.05 1255 0.00 0.00 vp_subchspread_
16.36 6.57 1.24 41100608 0.00 0.00 voigt_
6.99 7.10 0.53 528972139 0.00 0.00 dexpf_
3.56 7.37 0.27 17 0.02 0.38 deriv_
0.66 7.42 0.05 1 0.05 0.05 pr_sort_
0.66 7.47 0.05 1 0.05 0.05 probks_
0.40 7.50 0.03 478767 0.00 0.00 varythis_
0.40 7.53 0.03 1 0.03 7.52 vp_ucoptv_
0.26 7.55 0.02 33878 0.00 0.00 vp_archwav_
0.13 7.56 0.01 33885 0.00 0.00 calcn_
0.13 7.57 0.01 1255 0.00 0.00 vp_chipconv_
0.13 7.58 0.01 1 0.01 0.01 vp_gwclinfits_
0.00 7.58 0.00 793778 0.00 0.00 ucase_
vp_subchspread grows massively in usage time, and it is simply a convolution
routine. I have attached the source code for that routine.
http://www.nabble.com/file/p23527805/vp_subchspread.f vp_subchspread.f .
It's worth noting that dexpf() gets massively slower as well, which is
bizarre. It's a glibc call, I would have thought there shouldn't be an issue
with this.
Any help is appreciated. The program runs fine on g77, but I want to
parallelize with OpenMPI, which requires gfortran (AFAIK it can't be used
with g77).
Many thanks,
Julian
--
View this message in context: http://www.nabble.com/gfortran-4.3.2-about-3x-slower-than-g77-for-scientific-application-tp23527805p23527805.html
Sent from the gcc - fortran mailing list archive at Nabble.com.