This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: (a+b)+c should be replaced by a+(b+c)

From: Robert Dewar <dewar at gnat dot com>
To: Joost VandeVondele <jv244 at hermes dot cam dot ac dot uk>
Cc: gcc at gcc dot gnu dot org
Date: Thu, 25 Mar 2004 07:37:27 -0500
Subject: Re: (a+b)+c should be replaced by a+(b+c)
References: <Pine.SOL.4.58.0403250711490.18567@yellow.csi.cam.ac.uk>

Joost VandeVondele wrote:

good compilers (e.g. xlf90) will (at -O4) do higher order transforms of
the loop to introduce blocking, independent FMAs, ... that makes this
little piece of code about 100 times faster at O4 than O2 (what about
LNO/SSA?). This can only be done if you allow (a+b)+c -> a+(b+c). It is
basically what any optimized blas routine will do. Matrix multiply is a
trivial example, if you want blas performance, call blas. There are many
other kernels like this in e.g. scientific code that are not blas. You
can't expect a scientist to hand unroll and block any kernel to the
appropriate depth for any machine. There need to be a compiler option to
do this. This can only be done if you allow (a+b)+c -> a+(b+c).


Can you really deduce this freedom from later versions of the Fortran
standard?

Follow-Ups:
- Re: (a+b)+c should be replaced by a+(b+c)
  - From: Joost VandeVondele

References:
- (a+b)+c should be replaced by a+(b+c)
  - From: Joost VandeVondele

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]