This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: (a+b)+c should be replaced by a+(b+c)


Jakub Jelinek wrote:
On Thu, Mar 25, 2004 at 09:21:48AM -0500, Scott Robert Ladd wrote:

Joost VandeVondele wrote:

BTW, timing of the code below on IBM SP4 with xlf90, would be useful to
see how gfortran performs.

Being in a benchmarking mood, I took your code and compiled it on a 2.8GHz Pentium 4 (Northwood core). The results did not show gfortran in a very good light:

- - - - - - - - - - - - - - - - - - - - -

Tycho$ ifort -O3 -tpp7 -xN -ipo -o matmuli matmul.for
IPO: using IR for /tmp/ifortyRX1Wg.o
IPO: performing single-file optimizations
matmul.for(6) : (col. 6) remark: LOOP WAS VECTORIZED.
matmul.for(7) : (col. 6) remark: LOOP WAS VECTORIZED.
matmul.for(8) : (col. 6) remark: LOOP WAS VECTORIZED.
Tycho:$ ./matmuli
  5.90410300000000        10.2399999999998
Tycho$ gfortran -o matmulg -O3 -ffast-math -march=pentium4 matmul.for


You forgot -mfpmath=sse. That is only the default for -m64.

Jakub


Good point; I've been doing Opteron work for a week, and was getting used to not explicitly declaring certain flags.


Also, a minimized browser was playing a &%$!! Flash animation in the background, so I'll run numbers on a clean machine without the overhead.

And the compiler says:

 - - - - - - - - - - - - - - - -
Tycho$ gfortran -o matmulg -O3 -march=pentium4 -ffast-math matmul.for
Tycho$ ./matmulg
    64.9091330000000         10.2400000000000

Tycho$ gfortran -o matmulg -O3 -march=pentium4 -ffast-math -mfpmath=sse matmul.for
Tycho$ ./matmulg
64.6051790000000 10.2399999999998


Tycho$ gfortran -o matmulg -O3 -march=pentium4 -mfpmath=sse matmul.for
Tycho$ ./matmulg
    64.7361590000000         10.2399999999998

Tycho$ gfortran -o matmulg -O3 -march=pentium4 matmul.for
Tycho$ ./matmulg
    64.7751530000000         10.2400000000000
Tycho$

- - - - - - - - - - - - - - - -

[dry_sarcasm]
Well, we can see the -ffast-math *really* helps in this suituation, huh?
[/dry_sarcasm]


Nor did -mfpmath=sse show much value for this test. In my experience, -mfpmath=sse often fails to produce faster code (with gfortran or gcc)


What about Intel Fortran with their -mp1 and -mp options?

- - - - - - - - - - - - - - - -

Tycho$ ifort -O3 -tpp7 -xN -ipo -o matmuli matmul.for
Tycho$ ./matmuli
   4.85226200000000        10.2399999999998

Tycho$ ifort -O3 -tpp7 -xN -ipo -mp1 -o matmuli matmul.for
Tycho:~/projects/spikes$ ./matmuli
   4.90425400000000        10.2399999999998

Tycho$ ifort -O3 -tpp7 -xN -ipo -mp -o matmuli matmul.for
Tycho$ ./matmuli
   66.0699560000000        10.2399999999998

- - - - - - - - - - - - - - - -

Forcing Intel to stick with the "rules" does slow its performance. Certainly some food for thought...


-- Scott Robert Ladd Coyote Gulch Productions (http://www.coyotegulch.com) Software Invention for High-Performance Computing


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]