This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug libfortran/51119] MATMUL slow for large matrices
- From: "jvdelisle at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sun, 08 Nov 2015 23:33:14 +0000
- Subject: [Bug libfortran/51119] MATMUL slow for large matrices
- Auto-submitted: auto-generated
- References: <bug-51119-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119
Jerry DeLisle <jvdelisle at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jvdelisle at gcc dot gnu.org
--- Comment #16 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> ---
For what its worth:
$ gfc pr51119.f90 -lblas -fno-external-blas -Ofast -march=native
$ ./a.out
Time, MATMUL: 21.2483196 21.254449646000001 1.5055670945599979
Time, dgemm: 33.2441711 33.243087289000002 .96260614189671445
This is on a laptop not taking any advantage of a tuned BLAS. If I replace
-Ofast with -O2 I get:
$ ./a.out
Time, MATMUL: 43.6199570 43.625358022999997 0.73351833543988521
Time, dgemm: 33.2262650 33.226961453000001 0.96307331759072967
-O3 brings performance back to match with -Ofast. It seems odd to me that -O2
does not do well.
Regardless, the internal MATMUL is doing better than BLAS on this platform, but
1.5 gflops is pretty lame either way.