[Bug middle-end/46900] [4.6 Regression] 50% slowdown when linking with LTO in a single step

burnus at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Sun Dec 12 10:50:00 GMT 2010


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46900

--- Comment #4 from Tobias Burnus <burnus at gcc dot gnu.org> 2010-12-12 10:50:09 UTC ---
(In reply to comment #3)
> (I don't understand why the MATMUL part differs that much - it should call the
> same BLAS function [via the same GCC 4.6 libgfortran.so wrapper] and LTO should
> not affect it.)

Seemingly, LTO is crucial for 4.5 - without LTO dgemm gets slower but the
libgfortran version gets faster:

$ gfortran-4.5 -fexternal-blas -O3 -ffast-math -march=native test.f90 dgemm.f
lsame.f xerbla.f && ./a.out
 Time, MATMUL:    1.3200819       53.480084765505403     
 dgemm:    1.3120821       56.452265589399069

$ gfortran-4.5 -c -flto -fexternal-blas -O3 -ffast-math -march=native test.f90
dgemm.f lsame.f xerbla.f 
$ gfortran-4.5 -flto -O3 -ffast-math -march=native test.o dgemm.o lsame.o
xerbla.o
$ ./a.out
 Time, MATMUL:    1.3080810       53.480084765505403     
 dgemm:    1.0800680       56.452265589399069     

Here, for GCC 4.5, one sees that for the direct call of dgemm, LTO improves the
performance - and doing a single step compilation+linkage or in two steps does
not matter.
However, also for GCC 4.5 the single-step pessimizes the performance of the
libgfortran MATMUL (which is a wrapper for dgemm).



More information about the Gcc-bugs mailing list