This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

(a+b)+c should be replaced by a+(b+c)


I think there is an obvious need for doing the optimization
(a+b)+c -> a+(b+c) in e.g. many scientific codes.

consider matrix multiply
do k=1,N
 do j=1,N
  do i=1,N
   c(i,j)=c(i,j)+a(i,k)*b(k,j)
  enddo
 enddo
enddo

good compilers (e.g. xlf90) will (at -O4) do higher order transforms of
the loop to introduce blocking, independent FMAs, ... that makes this
little piece of code about 100 times faster at O4 than O2 (what about
LNO/SSA?). This can only be done if you allow (a+b)+c -> a+(b+c). It is
basically what any optimized blas routine will do. Matrix multiply is a
trivial example, if you want blas performance, call blas. There are many
other kernels like this in e.g. scientific code that are not blas. You
can't expect a scientist to hand unroll and block any kernel to the
appropriate depth for any machine. There need to be a compiler option to
do this. This can only be done if you allow (a+b)+c -> a+(b+c).

Joost


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]