This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Performance analysis of Polyhedron/gas_dyn

From: Janne Blomqvist <blomqvist dot janne at gmail dot com>
To: Richard Guenther <richard dot guenther at gmail dot com>
Cc: gfortran <fortran at gcc dot gnu dot org>, gcc at gcc dot gnu dot org
Date: Fri, 27 Apr 2007 13:12:58 +0300
Subject: Re: Performance analysis of Polyhedron/gas_dyn
References: <4631A891.5020002@gmail.com> <84fc9c000704270127s4eb0030p9d27fd5fdff73cf8@mail.gmail.com>

Richard Guenther wrote:

See also http://www.suse.de/~gcctest/c++bench/polyhedron/analysis.html
(same conclusion for gas_dyn).

Thanks, I seem to have completely missed that page (though I was aware of your polyhedron tester).

>On 4/27/07, Janne Blomqvist <blomqvist.janne@gmail.com> wrote: >> The reason, it seems, is that ifort (and presumably other commercial

compilers with competitive scores in gas_dyn) avoids calculating
divisions and square roots, replacing them with reciprocals and
reciprocal square roots. E.g. in EOS sqrt(a/b) can be calculated as
1/sqrt(b*(1/a)). This has a big impact on performance, since the SSE
instruction set contains very fast instructions for this, rcpps, rcpss,
rsqrtps, rsqrtss (PPC/Altivec also has equivalent instructions). These
instructions have latencies of 1-2 cycles vs. dozens or even hundreds of
cycles for normal division and square root.  The price to be paid for
this speed is that these reciprocal instructions have an accuracy of
only 12 bits, so clearly they can be enabled only for -ffast-math. And
they are available only for single precision. I'll file a
missed-optimization PR about this.


I think that even with -ffast-math 12 bits accuracy is not ok.  There is
the possibility of doing another newton iteration step to improve
accuracy, that would be ok for -ffast-math.  We can, though, add an
extra flag -msserecip or however you'd call it to enable use of the
instructions with less accuracy.

I agree it can be an issue, but OTOH people who care about precision probably 1. avoid -ffast-math 2. use double precision (where these reciprocal instrs are not available). Intel calls it -no-prec-div, but it's enabled for the "-fast" catch-all option.

On a related note, our beloved competitors generally have some high level flag for combining all these fancy and potentially unsafe optimizations (e.g. -O4, -fast, -fastsse, -Ofast, etc.). For gcc, at least FP benchmarks seem to do generally well with something like "-O3 -funroll-loops -ftree-vectorize -ffast-math -march=native -mfpmath=sse", but it's quite a mouthful.

--
Janne Blomqvist

Follow-Ups:
- Re: Performance analysis of Polyhedron/gas_dyn
  - From: Geert Bosch

References:
- Re: Performance analysis of Polyhedron/gas_dyn
  - From: Richard Guenther

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]