This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: (a+b)+c should be replaced by a+(b+c)
Jakub Jelinek wrote:
On Thu, Mar 25, 2004 at 09:21:48AM -0500, Scott Robert Ladd wrote:
Joost VandeVondele wrote:
BTW, timing of the code below on IBM SP4 with xlf90, would be useful to
see how gfortran performs.
Being in a benchmarking mood, I took your code and compiled it on a
2.8GHz Pentium 4 (Northwood core). The results did not show gfortran in
a very good light:
- - - - - - - - - - - - - - - - - - - - -
Tycho$ ifort -O3 -tpp7 -xN -ipo -o matmuli matmul.for
IPO: using IR for /tmp/ifortyRX1Wg.o
IPO: performing single-file optimizations
matmul.for(6) : (col. 6) remark: LOOP WAS VECTORIZED.
matmul.for(7) : (col. 6) remark: LOOP WAS VECTORIZED.
matmul.for(8) : (col. 6) remark: LOOP WAS VECTORIZED.
Tycho:$ ./matmuli
5.90410300000000 10.2399999999998
Tycho$ gfortran -o matmulg -O3 -ffast-math -march=pentium4 matmul.for
You forgot -mfpmath=sse. That is only the default for -m64.
Jakub
Good point; I've been doing Opteron work for a week, and was getting
used to not explicitly declaring certain flags.
Also, a minimized browser was playing a &%$!! Flash animation in the
background, so I'll run numbers on a clean machine without the overhead.
And the compiler says:
- - - - - - - - - - - - - - - -
Tycho$ gfortran -o matmulg -O3 -march=pentium4 -ffast-math matmul.for
Tycho$ ./matmulg
64.9091330000000 10.2400000000000
Tycho$ gfortran -o matmulg -O3 -march=pentium4 -ffast-math -mfpmath=sse
matmul.for
Tycho$ ./matmulg
64.6051790000000 10.2399999999998
Tycho$ gfortran -o matmulg -O3 -march=pentium4 -mfpmath=sse matmul.for
Tycho$ ./matmulg
64.7361590000000 10.2399999999998
Tycho$ gfortran -o matmulg -O3 -march=pentium4 matmul.for
Tycho$ ./matmulg
64.7751530000000 10.2400000000000
Tycho$
- - - - - - - - - - - - - - - -
[dry_sarcasm]
Well, we can see the -ffast-math *really* helps in this suituation, huh?
[/dry_sarcasm]
Nor did -mfpmath=sse show much value for this test. In my experience,
-mfpmath=sse often fails to produce faster code (with gfortran or gcc)
What about Intel Fortran with their -mp1 and -mp options?
- - - - - - - - - - - - - - - -
Tycho$ ifort -O3 -tpp7 -xN -ipo -o matmuli matmul.for
Tycho$ ./matmuli
4.85226200000000 10.2399999999998
Tycho$ ifort -O3 -tpp7 -xN -ipo -mp1 -o matmuli matmul.for
Tycho:~/projects/spikes$ ./matmuli
4.90425400000000 10.2399999999998
Tycho$ ifort -O3 -tpp7 -xN -ipo -mp -o matmuli matmul.for
Tycho$ ./matmuli
66.0699560000000 10.2399999999998
- - - - - - - - - - - - - - - -
Forcing Intel to stick with the "rules" does slow its performance.
Certainly some food for thought...
--
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Software Invention for High-Performance Computing