This is the mail archive of the
fortran@gcc.gnu.org
mailing list for the GNU Fortran project.
Re: 32b intel fortran vs. 64b linux gfortran
- From: Tobias Burnus <burnus at net-b dot de>
- To: Edvardsen KÃre <kare dot edvardsen at uit dot no>
- Cc: "fortran at gcc dot gnu dot org" <fortran at gcc dot gnu dot org>
- Date: Mon, 14 May 2012 14:38:02 +0200
- Subject: Re: 32b intel fortran vs. 64b linux gfortran
- References: <1336994866.4783.19.camel@kare-desktop>
On 05/14/2012 01:27 PM, Edvardsen KÃre wrote:
I'm trying to compare the performance of my code between 32b intel
fortran (Win7) and 64b linux gfortran. What I see is that the argument
for the trigonometric functions (sin, cos ,tan etc.) only take 7 digits
on my Windows 32b intel fortran when compiled, when on my 64b linux
gfortran more digits are used.
Can you give an example? The default REAL data type without extra flags
should be the same:
. real, volatile :: r
r = sqrt(2.0)
r = sin (r)
print '(f16.13)', r
end
should print 0.9877659678459 with both compilers. (I assume that the
math librarys gives exactly and not only nearly the same result.)
There is a difference for list-directed I/O, where gfortran prints more
digits by default. For
print *, r
one has with ifort 0.9877660 and with gfortran 0.987765968 but that's
internally the same binary number.
If one converts a binary FP number to a decimal number - or vice versa -
on might have the problem that ta number is not exactly representable.
For instance the decimal number number 0.1 can be either rounded up or
down as no binary number matches exactly; thus, if one prints the
variable, one might get for '(f16.14)' either of the two lines
0.10000000149012
0.09999999403954
The first line is what one typically gets for 0.1. (The second has been
obtained by nearest(0.1,-1.0).) Thus, a compiler could simply print
"0.1" instead as you couldn't distinguish between the numbers in 32-bit
binary FP. It is simply an implementation choice whether one prints
fewer (ifort ) or more (gfortran) digits with "*" (list-directed I/O).
When doing performance comparisons, recall that they have different
defaults in terms of optimization and that also the same flag (-O2) can
mean different things. For benchmarks, I use, e..g.,
gfortran -march=native -ffast-math -funroll-loops -O3
-finline-limit=600 -fstack-arrays -fno-protect-parens
(and possibly compiling and linking with -flto) and
ifort -fast
For nonbenchmarks, you should consider to leave out -ffast-math and
-fno-protect-parens - as depending on the algorithm, that might lead to
wrong results. Though, you might be lucky and your algorithm is stable
enough for your input - such that the result is not or only negligibly
effected. (See GCC man page/documentation and
http://gcc.gnu.org/wiki/FloatingPointMath )
(For ifort, you have to do something similar; for instance "-assume
protect_parens" as that option is enabled by default but also something
similar to -fno-fast-math; I think it could be -prec-div, but there
might be more or the name could be different.)
Note additionally that updating a compiler usually helps with the
performance. Thus, the newest ifort should win (on average) against an
old gfortran and vice versa. But for a single program, a factor 2 should
be not surprising. When I compared GCC/gfortran 4.7 with ifort 12.0 and
12.1 using the Polyhedron benchmark, GCC was minutely (<~ 1%) faster
than 12.0 while 12.1 was a bit less than 7% faster. And GCC 4.5 was 20%
slower than 4.7.
Tobias