This is the mail archive of the fortran@gcc.gnu.org mailing list for the GNU Fortran project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Polyhedron shootout between g95 and gfortran


Steven Bosscher wrote:

> (It looks like tfft didn't work with g95 at all???)

Although this is certainly off topic for this list, I suspect that
you are using a version with the warning
"Default integer of 64 bits, may break older programs". If yes,
you must compile tfft.f90 with the option -i4; On an AMD machine
I get: 10.0s (2Ghz 1Mb L2 cache). Note that this program times
the size of the L2 cache more than anything else.

> 1) gfortran has one aermod test failing:

On a G5 with GNU Fortran 95 (GCC) 4.2.0 20060218 (experimental), I get

...
  --------- Summary of Total Messages --------
  
 A Total of            0 Fatal Error Message(s)
 A Total of            0 Warning Message(s)
 A Total of          638 Informational Message(s)

 A Total of          534 Calm Hours Identified

 A Total of          104 Missing Hours Identified (  4.81 Percent)
  
  
    ******** FATAL ERROR MESSAGES ******** 
               ***  NONE  ***         
  
  
    ********   WARNING MESSAGES   ******** 
               ***  NONE  ***        
  

    ************************************
    *** AERMOD Finishes Successfully ***
    ************************************

> 2) gfortran is _awful_ for induct... Does anyone know why?

induct.f90 use dot_product with length 3 vectors. A couple of months
ago Paul Thomas propsed a patch to inline the dot_product.
On a G5 (1.8Ghz) I get 295.2s without the patch and 68.4s with it.

As far as I can tell, the inlined dot_product is always faster than
the library function up to size 8192 (for large size the lib function
is not better than a plain do loop) on the G5.

I'ld like to mention that test_fpu.f90 uses a lot of dot_product
in the crout subroutine and I get

  Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts 11.0 sec  Err= 0.000000000000002
Test2 - Crout 2000 (101x101) inverts 10.3 sec  Err= 0.000000000000003
Test3 - Crout  2 (1001x1001) inverts 16.5 sec  Err= 0.000000000000067
Test4 - Lapack 2 (1001x1001) inverts  9.2 sec  Err= 0.000000000000259
                             total = 47.1 sec

without the patch, and

  Benchmark running, hopefully as only ACTIVE task
Test1 - Gauss 2000 (101x101) inverts  4.6 sec  Err= 0.000000000000002
Test2 - Crout 2000 (101x101) inverts  8.3 sec  Err= 0.000000000000003
Test3 - Crout  2 (1001x1001) inverts 15.4 sec  Err= 0.000000000000067
Test4 - Lapack 2 (1001x1001) inverts  9.2 sec  Err= 0.000000000000259
                             total = 37.5 sec

Note that I also use another Paul Thomas' patch in dependency.c
to speed up the line

      b(:,j) = b(:,j)-temp*c

So may be we can team together to press Paul to include these two 
patches in the snapshots (at least in 4.2).

Dominique


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]