This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug fortran/68600] Inlined MATMUL is too slow.
- From: "dominiq at lps dot ens.fr" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Mon, 30 Nov 2015 11:14:59 +0000
- Subject: [Bug fortran/68600] Inlined MATMUL is too slow.
- Auto-submitted: auto-generated
- References: <bug-68600-4 at http dot gcc dot gnu dot org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600
Dominique d'Humieres <dominiq at lps dot ens.fr> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2015-11-30
Ever confirmed|0 |1
--- Comment #6 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
> I think you are seeing the effects of inefficiencies of assumed-shape arrays.
>
> If you want to use matmul on very small matrix sizes, it is best to
> use fixed-size explicit arrays.
Well, the problem is that MATMUL inlining is the default. IMO it should be
restricted to fixed-size explicit arrays (and small matrices?), at least for
the 6.1 version.
> Created attachment 36869 [details]
> Thomas program with a modified dgemm.
>
> The dgemm in this example is a stripped out version of an "optimized for cache"
> version from netlib.org. I stripped out a lot of the unused code.
It is probably too late for 6.1, but the results are quite impressive
(~30Gflops/s peak):
[Book15] f90/bug% gfc -Ofast timing/matmul_sys_8jd.f90
[Book15] f90/bug% a.out
Size Loops Matmul dgemm Matmul Matmul
fixed explicit assumed variable
explicit
=====================================================================================
2 200000 0.969 0.104 0.360 0.368
4 200000 5.821 0.774 1.381 1.049
8 200000 5.415 2.970 2.316 2.342
16 200000 6.455 4.917 2.738 3.225
32 200000 7.332 5.964 2.893 4.117
64 30757 5.565 7.277 2.785 3.830
128 3829 4.790 7.982 2.981 4.384
256 477 4.674 8.375 3.077 4.675
512 59 4.797 8.200 3.156 4.786
1024 7 3.967 8.370 2.896 4.050
2048 1 3.693 8.414 2.804 3.650
[Book15] f90/bug% gfc -Ofast -mavx timing/matmul_sys_8jd.f90
[Book15] f90/bug% a.out
Size Loops Matmul dgemm Matmul Matmul
fixed explicit assumed variable
explicit
=====================================================================================
2 200000 0.956 0.106 0.372 0.469
4 200000 7.805 0.715 1.334 1.462
8 200000 7.520 3.222 2.292 3.482
16 200000 3.001 6.406 2.671 4.917
32 200000 8.886 8.530 2.900 6.136
64 30757 10.203 10.998 2.677 6.770
128 3829 6.742 13.367 2.831 6.774
256 477 6.435 13.979 2.906 6.049
512 59 6.592 15.041 2.991 6.273
1024 7 5.247 14.639 2.775 4.922
2048 1 4.309 13.976 2.739 4.176
Note a problem when 16x16 matrices are inlined with -mavx (I'll investigate and
file a PR for it).