This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch,libgfortran] PR51119 - MATMUL slow for large matrices


On Sun, Nov 13, 2016 at 04:08:50PM -0800, Jerry DeLisle wrote:
> Hi all,
> 
> Attached patch implements a fast blocked matrix multiply. The basic algorithm is 
> derived from netlib.org tuned blas dgemm. See matmul.m4 for reference.
> 
> The matmul() function is compiled with -Ofast -funroll-loops. This can be 
> customized further if there is an undesired optimization being used. This is 
> accomplished using #pragma optimize ( string ).
> 

Did you run any tests with '--param max-unroll-times=4' where
the 4 could be something other than 4.  On troutmask, with my
code I've found that 4 seems to work the best with -funroll-loops.

-- 
Steve


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]