[Bug middle-end/46900] [4.6 Regression] 50% slowdown when linking with LTO in a single step
burnus at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Sun Dec 12 10:50:00 GMT 2010
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46900
--- Comment #4 from Tobias Burnus <burnus at gcc dot gnu.org> 2010-12-12 10:50:09 UTC ---
(In reply to comment #3)
> (I don't understand why the MATMUL part differs that much - it should call the
> same BLAS function [via the same GCC 4.6 libgfortran.so wrapper] and LTO should
> not affect it.)
Seemingly, LTO is crucial for 4.5 - without LTO dgemm gets slower but the
libgfortran version gets faster:
$ gfortran-4.5 -fexternal-blas -O3 -ffast-math -march=native test.f90 dgemm.f
lsame.f xerbla.f && ./a.out
Time, MATMUL: 1.3200819 53.480084765505403
dgemm: 1.3120821 56.452265589399069
$ gfortran-4.5 -c -flto -fexternal-blas -O3 -ffast-math -march=native test.f90
dgemm.f lsame.f xerbla.f
$ gfortran-4.5 -flto -O3 -ffast-math -march=native test.o dgemm.o lsame.o
xerbla.o
$ ./a.out
Time, MATMUL: 1.3080810 53.480084765505403
dgemm: 1.0800680 56.452265589399069
Here, for GCC 4.5, one sees that for the direct call of dgemm, LTO improves the
performance - and doing a single step compilation+linkage or in two steps does
not matter.
However, also for GCC 4.5 the single-step pessimizes the performance of the
libgfortran MATMUL (which is a wrapper for dgemm).
More information about the Gcc-bugs
mailing list