A follow up to my earlier posts:
Profiling shows that the time difference between "very slow" and "full
speed" is all spent in MPI synchronization routines.
Let me first summarize my findings:
* Simulator runs at full speed on Ubuntu Hardy 32-bit using
repository MPICH and gfortran.
* Simulator runs at full speed on Gentoo 32-big using repository
versions of MPICH and gfortran.
* Simulator runs very slow on Ubuntu Hardy 64-bit using repository
MPICH and gfortran
* Simulator runs at full speed on Hardy 64-bit when MPICH (not the
simulator itself!) is compiled with Intel Fortran (ifort).
It does not matter if the simulator itself is compiled with
ifort or gfortran.
* Simulator runs very slow on Ubuntu Hardy 64-bit when MPICH is
compiled using gfortran.
It does not matter if the simulator itself is compiled with
ifort or gfortran.
These results are reproducible for several machines, both dual and quad core.
I am not yet certain if this is due to a bug in Gfortran, MPICH or Linux.