[Bug libfortran/51119] MATMUL slow for large matrices

Mon Nov 23 17:58:00 GMT 2015

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119

--- Comment #23 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> ---
(In reply to Thomas Koenig from comment #21)
> > Hidden behind a -fexternal-blas-n switch might be an option. Including GPUs
> > seems even a tad more tricky. We have a paper on GPU (small) matrix
> > multiplication, http://dbcsr.cp2k.org/_media/gpu_book_chapter_submitted.pdf
> 
> Quite interesting what can be done with GPUs...
> 

Run of the mill graphics processing units have many floating point compute
cores.  128 cores is not unusual, usually a lot more. These cores perform basic
things like a + b * c on scalars. and other useful functions. Softwares like
OpenCL will compile compute kernels which will run efficiently in parallel on
these GPU architectures. clBLAS is a runtime library which encapsulates this
capability with a BLAS compatible API.  Conceptually you initialize for
particular matrices and hand of the work to the GPU.

My low end laptop (300 dollar variety) is running an nbody 3D model with
several thousand masses without even pressing the CPU as an example. MATMUL
should be doable.

The main GPU competitors are Nvidia, AMD. and Intel. OpenCL is supported on all
three.