This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [patch,libgfortran] PR51119 - MATMUL slow for large matrices
- From: Steve Kargl <sgk at troutmask dot apl dot washington dot edu>
- To: Jerry DeLisle <jvdelisle at charter dot net>
- Cc: "fortran at gcc dot gnu dot org" <fortran at gcc dot gnu dot org>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Sun, 13 Nov 2016 16:55:24 -0800
- Subject: Re: [patch,libgfortran] PR51119 - MATMUL slow for large matrices
- Authentication-results: sourceware.org; auth=none
- References: <2aad89ce-02e1-45bf-0bdc-d318e7995595@charter.net>
On Sun, Nov 13, 2016 at 04:08:50PM -0800, Jerry DeLisle wrote:
> Hi all,
>
> Attached patch implements a fast blocked matrix multiply. The basic algorithm is
> derived from netlib.org tuned blas dgemm. See matmul.m4 for reference.
>
> The matmul() function is compiled with -Ofast -funroll-loops. This can be
> customized further if there is an undesired optimization being used. This is
> accomplished using #pragma optimize ( string ).
>
Did you run any tests with '--param max-unroll-times=4' where
the 4 could be something other than 4. On troutmask, with my
code I've found that 4 seems to work the best with -funroll-loops.
--
Steve