This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug libfortran/51119] MATMUL slow for large matrices

From: "jvdelisle at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Wed, 25 Nov 2015 00:48:17 +0000
Subject: [Bug libfortran/51119] MATMUL slow for large matrices
Auto-submitted: auto-generated
References: <bug-51119-4 at http dot gcc dot gnu dot org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119

--- Comment #30 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> ---
(In reply to Joost VandeVondele from comment #29)

> These slides show how to reach 90% of peak:
> http://wiki.cs.utexas.edu/rvdg/HowToOptimizeGemm/
> the code actually is not too ugly, and I think there is no need for the
> explicit vector intrinsics with gcc.

The 90% of peak is achieved using SSE registers.  I went ahead and built the
example and on my laptop (the slow machine) I get about 4.8 gflops with a
single core.  So we could use this example and back-off from the SSE
optimizations to get an internal MATMUL that is not architecture dependent and
perhaps leave the rest to external optimized BLAS.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]