This is the mail archive of the
gcc-help@gcc.gnu.org
mailing list for the GCC project.
Possible performance issue with gfortran? denormalized numbers
- From: Jose Miguel Reynolds Barredo <jmrb2002 at gmail dot com>
- To: gcc-help at gcc dot gnu dot org
- Date: Mon, 1 Feb 2016 12:55:06 +0100
- Subject: Possible performance issue with gfortran? denormalized numbers
- Authentication-results: sourceware.org; auth=none
Hi everyone,
I was developing a tridiagonal block solver and I found a performance
issue that is ingriguing me: doing operations with numbers in the
denormalized range is around ten times slower than regular numbers
(https://en.wikipedia.org/wiki/Denormal_number). As an example, I
builded a very simple code:
program test
implicit none
integer :: j
real :: m1(10000000),m2(10000000)
do j=1,10
m1=1.9123012391238E-39 !if the exponent is changed to -37, the
code is ten times faster
m2=1.2903458938459E0*m1
enddo
print *,'stop',m1(10000),m2(2323)
end program test
First, some comments about the code:
- The vectors has to be long enough not to fit into the cache.
- The long numbers values are random, just to make sure that the
compiler optimization is doing something extrange.
- The last printed value is to force the compiler optimizer to run the code.
- The code is compiled with "gfortran -O2" option (version 4.4.7).
If we run this example with simple precision, the code takes several
seconds, but if we change the value 1.91230123912389E-39 to
1.91230123912389E-38, the code takes ten times less time. The issue
has to do with being in the denormalized range. If I run the code with
ifort, the case with 1.91230123912389E-39 gives you the next warning:
test.f90(14): remark #7920: The value was too small when converting to
REAL(KIND=4); the result is in the denormalized range.
[1.91230123912389E-39]
m1=1.91230123912389E-39!321 for double precision
but the code run as fast as the case with E-37.
Some extra tips:
- The issue does not appear if all calculations are inside the CPU or
the cache: if the vector size is smaller, no problem appear, this is
because the vector has to be long enough.
- The same issue appears on double precission at exponent around 307,
just in the limit of denormalized range.
- I run in several platforms (all intel based but different processor
models) and issue continues.
May be you say that no code run on those small numbers, but in my
case, a tridiagonal solver acts in some set of variables as an
attractor to that range, and never goes to real zero. In fact, the
attractor can be simplied to this:
aux1=-9*m1(j-1)+70*m2(j-1)
aux2=-9*m2(j-1)
m1(j)=( 7*aux1-54*aux2)/4790
m2(j)=(77*aux1+69*aux2)/4790
(in this case, values are not critical, but importants).
a code iterating this converges to numbers in the denormalized range,
and stay there.
I have fixed the issue just setting to 0.0E0 the values smaller than
1.0E-40 for example, that for my requirements is more than enough, but
it would be great to find a better solution for this! Thank you for
any help!
JM