This is the mail archive of the
fortran@gcc.gnu.org
mailing list for the GNU Fortran project.
Re: Polyhedron shootout between g95 and gfortran
- From: dominiq at lps dot ens dot fr (Dominique Dhumieres)
- To: fortran at gcc dot gnu dot org
- Date: Thu, 23 Feb 2006 00:05:20 +0100
- Subject: Re: Polyhedron shootout between g95 and gfortran
Steven Bosscher wrote:
> (It looks like tfft didn't work with g95 at all???)
Although this is certainly off topic for this list, I suspect that
you are using a version with the warning
"Default integer of 64 bits, may break older programs". If yes,
you must compile tfft.f90 with the option -i4; On an AMD machine
I get: 10.0s (2Ghz 1Mb L2 cache). Note that this program times
the size of the L2 cache more than anything else.
> 1) gfortran has one aermod test failing:
On a G5 with GNU Fortran 95 (GCC) 4.2.0 20060218 (experimental), I get
...
--------- Summary of Total Messages --------
A Total of 0 Fatal Error Message(s)
A Total of 0 Warning Message(s)
A Total of 638 Informational Message(s)
A Total of 534 Calm Hours Identified
A Total of 104 Missing Hours Identified ( 4.81 Percent)
******** FATAL ERROR MESSAGES ********
*** NONE ***
******** WARNING MESSAGES ********
*** NONE ***
************************************
*** AERMOD Finishes Successfully ***
************************************
> 2) gfortran is _awful_ for induct... Does anyone know why?
induct.f90 use dot_product with length 3 vectors. A couple of months
ago Paul Thomas propsed a patch to inline the dot_product.
On a G5 (1.8Ghz) I get 295.2s without the patch and 68.4s with it.
As far as I can tell, the inlined dot_product is always faster than
the library function up to size 8192 (for large size the lib function
is not better than a plain do loop) on the G5.
I'ld like to mention that test_fpu.f90 uses a lot of dot_product
in the crout subroutine and I get
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 11.0 sec Err= 0.000000000000002
Test2 - Crout 2000 (101x101) inverts 10.3 sec Err= 0.000000000000003
Test3 - Crout 2 (1001x1001) inverts 16.5 sec Err= 0.000000000000067
Test4 - Lapack 2 (1001x1001) inverts 9.2 sec Err= 0.000000000000259
total = 47.1 sec
without the patch, and
Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 4.6 sec Err= 0.000000000000002
Test2 - Crout 2000 (101x101) inverts 8.3 sec Err= 0.000000000000003
Test3 - Crout 2 (1001x1001) inverts 15.4 sec Err= 0.000000000000067
Test4 - Lapack 2 (1001x1001) inverts 9.2 sec Err= 0.000000000000259
total = 37.5 sec
Note that I also use another Paul Thomas' patch in dependency.c
to speed up the line
b(:,j) = b(:,j)-temp*c
So may be we can team together to press Paul to include these two
patches in the snapshots (at least in 4.2).
Dominique