This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug fortran/68600] Inlined MATMUL is too slow.


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600

--- Comment #5 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Another interesting data point.  I deleted the DGEMM implementation from
the file and linked against the serial version of openblas. OK,
openblas is based on GOTO blas, so we have to expect a hit
for large matrices.

Figures:

ig25@linux-fd1f:~/Krempel/Bench> gfortran -O2 -funroll-loops  bench-3.f90
-lopenblas_serial
ig25@linux-fd1f:~/Krempel/Bench> ./a.out
 Size     Loops          Matmul           dgemm         Matmul          Matmul
                      fixed explicit                    assumed      variable
explicit

=====================================================================================
    2    200000          11.944           0.035           0.136           0.412
    4    200000           1.712           0.257           0.458           0.738
    8    200000           2.080           1.162           0.824           1.077
   16    200000           1.697           3.104           0.939           0.995
   32    200000           1.450           4.814           1.388           1.426
   64     30757           1.485           5.978           1.351           1.371
  128      3829           1.557           6.857           1.534           1.522
  256       477           1.568           7.017           1.589           1.537

So far so good.  Looks as if the crossover point for the inline and the dgemm 
version is between 8 and 16, so let us try this:

ig25@linux-fd1f:~/Krempel/Bench> gfortran -O2 -funroll-loops
-finline-matmul-limit=12 -fexternal-blas bench-3.f90 -lopenblas_serial
ig25@linux-fd1f:~/Krempel/Bench> ./a.out
 Size     Loops          Matmul           dgemm         Matmul          Matmul
                      fixed explicit                    assumed      variable
explicit

=====================================================================================
    2    200000          11.948           0.039           0.156           0.464
    4    200000           1.999           0.305           0.542           0.859
    8    200000           2.435           1.359           0.962           1.255
   16    200000           0.802           3.102           0.798           0.799
   32    200000           4.878           4.990           4.906           4.906
   64     30757           6.045           6.062           5.977           5.968

So, if the user really wants us to call an external BLAS, we had better
do so directly and not through our library routines.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]