This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug fortran/68600] Inlined MATMUL is too slow.


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600

Dominique d'Humieres <dominiq at lps dot ens.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2015-11-30
     Ever confirmed|0                           |1

--- Comment #6 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
> I think you are seeing the effects of inefficiencies of assumed-shape arrays.
>
> If you want to use matmul on very small matrix sizes, it is best to
> use fixed-size explicit arrays.

Well, the problem is that MATMUL inlining is the default. IMO it should be
restricted to fixed-size explicit arrays (and small matrices?), at least for
the 6.1 version.

> Created attachment 36869 [details]
> Thomas program with a modified dgemm.
>
> The dgemm in this example is a stripped out version of an "optimized for cache"
> version from netlib.org.  I stripped out a lot of the unused code.

It is probably too late for 6.1, but the results are quite impressive
(~30Gflops/s peak):

[Book15] f90/bug% gfc -Ofast timing/matmul_sys_8jd.f90
[Book15] f90/bug% a.out
 Size     Loops          Matmul           dgemm         Matmul          Matmul
                      fixed explicit                    assumed      variable
explicit

=====================================================================================
    2    200000           0.969           0.104           0.360           0.368
    4    200000           5.821           0.774           1.381           1.049
    8    200000           5.415           2.970           2.316           2.342
   16    200000           6.455           4.917           2.738           3.225
   32    200000           7.332           5.964           2.893           4.117
   64     30757           5.565           7.277           2.785           3.830
  128      3829           4.790           7.982           2.981           4.384
  256       477           4.674           8.375           3.077           4.675
  512        59           4.797           8.200           3.156           4.786
 1024         7           3.967           8.370           2.896           4.050
 2048         1           3.693           8.414           2.804           3.650
[Book15] f90/bug% gfc -Ofast -mavx timing/matmul_sys_8jd.f90
[Book15] f90/bug% a.out
 Size     Loops          Matmul           dgemm         Matmul          Matmul
                      fixed explicit                    assumed      variable
explicit

=====================================================================================
    2    200000           0.956           0.106           0.372           0.469
    4    200000           7.805           0.715           1.334           1.462
    8    200000           7.520           3.222           2.292           3.482
   16    200000           3.001           6.406           2.671           4.917
   32    200000           8.886           8.530           2.900           6.136
   64     30757          10.203          10.998           2.677           6.770
  128      3829           6.742          13.367           2.831           6.774
  256       477           6.435          13.979           2.906           6.049
  512        59           6.592          15.041           2.991           6.273
 1024         7           5.247          14.639           2.775           4.922
 2048         1           4.309          13.976           2.739           4.176

Note a problem when 16x16 matrices are inlined with -mavx (I'll investigate and
file a PR for it).

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]